Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials - Data

1210 Articles
article-image-powerful-custom-visuals-in-power-bi-tutorial
Pravin Dhandre
25 Jul 2018
17 min read
Save for later

4 powerful custom visuals in Power BI: Why, When, and How to add [Tutorial]

Pravin Dhandre
25 Jul 2018
17 min read
Power BI report authors and BI teams are well-served to remain conscience of both the advantages and limitations of custom visuals. For example, when several measures or dimension columns need to be displayed within the same visual, custom visuals such as the Impact Bubble Chart and the Dot Plot by Maq Software may exclusively address this need. In many other scenarios, a trade-off or compromise must be made between the incremental features provided by a custom visual and the rich controls built into a standard Power BI visual. In this tutorial, we show how to add a custom visual to Power BI and explore 4 powerful custom visuals, and the distinct scenarios and features they support. The Power BI tutorial is taken from Mastering Microsoft Power BI. Learn more - read the book here. Custom visuals available in AppSource and within the integrated custom visuals store for Power BI Desktop are all approved for running in browsers and on mobile devices via the Power BI mobile apps. A subset of these visuals have been certified by Microsoft and support additional Power BI features such as email subscriptions and export to PowerPoint. Additionally, certified custom visuals have met a set of code requirements and have passed strict security tests. The list of certified custom visuals and additional details on the certification process is available here. Adding a custom visual Custom visuals can be added to Power BI reports by either downloading .pbiviz files from Microsoft AppSource or via the integrated Office Store of custom visuals in Power BI Desktop. Utilizing AppSource requires the additional step of downloading the file; however, it can be more difficult to find the appropriate visual as the visuals are not categorized. However, AppSource provides a link to download a sample Power BI report (.pbix file) to learn how the visual is used, such as how it uses field inputs and formatting options. Additionally, AppSource includes a short video tutorial on building report visualizations with the custom visual. The following image reflects Microsoft AppSource filtered by the Power BI visuals Add-ins category: The following link filters AppSource to the Power BI custom visuals per the preceding image: http://bit.ly/2BIZZbZ. The search bar at the top and the vertical scrollbar on the right can be used to browse and identify custom visuals to download. Each custom visual tile in AppSource includes a Get it now link which, if clicked, presents the option to download either the custom visual itself (.pbiviz file) or the sample report for the custom visual (.pbix file). Clicking anywhere else in the tile other than Get it now prompts a window with a detailed overview of the visual, a video tutorial, and customer reviews. To add custom visuals directly to Power BI reports, click the Import from store option via the ellipsis of the Visulaizations pane, as per the following image: If a custom visual (.pbiviz file) has been downloaded from AppSource, the Import from file option can be used to import this custom visual to the report. Additionally, both the Import from store and Import from file options are available as icons on the Home tab of the Report view in Power BI Desktop. Selecting Import from store launches an MS Office Store window of Power BI Custom Visuals. Unlike AppSource, the visuals are assigned to categories such as KPIs, Maps, and Advanced Analytics, making it easy to browse and compare related visuals. More importantly, utilizing the integrated Custom Visuals store avoids the need to manage .pbiviz files and allows report authors to remain focused on report development. As an alternative to the VISUALIZATIONS pane, the From Marketplace and From File icons on the Home tab of the Report view can also be used to add a custom visual. Clicking the From Marketplace icon in the follow image launches the same MS Office Store window of Power BI Custom visuals as selecting Import from store via the VISUALIZATIONS pane: In the following image, the KPIs category of Custom visuals is selected from within the MS Office store: The Add button will directly add the custom visual as a new icon in the Visualizations pane. Selecting the custom visual icon will provide a description of the custom visual and any customer reviews. The Power BI team regularly features new custom visuals in the blog post and video associated with the monthly update to Power BI Desktop. The visual categories, customer reviews, and supporting documentation and sample reports all assist report authors in choosing the appropriate visual and using it correctly. Organizations can also upload custom visuals to the Power BI service via the organization visuals page of the Power BI Admin portal. Once uploaded, these visuals are exposed to report authors in the MY ORGANIZATION tab of the custom visuals MARKETPLACE as per the following example: This feature can help both organizations and report authors simplify their use of custom visuals by defining and exposing a particular set of approved custom visuals. For example, a policy could define that new Power BI reports must only utilize standard and organizational custom visuals. The list of organizational custom visuals could potentially only include a subset of the visuals which have been certified by Microsoft. Alternatively, an approval process could be implemented so that the use case for a custom visual would have to be proven or validated prior to adding this visual to the list of organizational custom visuals. Power KPI visual Key Performance Indicators (KPIs) are often prominently featured in Power BI dashboards and in the top left area of Power BI report pages, given their ability to quickly convey important insights. Unlike card and gauge visuals which only display a single metric or a single metric relative to a target respectively, KPI visuals support trend, variance, and conditional formatting logic. For example, without analyzing any other visuals, a user could be drawn to a red KPI indicator symbol and immediately understand the significance of a variance to a target value as well as the recent performance of the KPI metric. For some users, particularly executives and senior managers, a few KPI visuals may represent their only exposure to an overall Power BI solution, and this experience will largely define their impression of Power BI's capabilities and the Power BI project. Given their power and important use cases, report authors should become familiar with both the standard KPI visual and the most robust custom KPI visuals such as the Power KPI Matrix, the Dual KPI, and the Power KPI. Each of these three visuals have been developed by Microsoft and provide additional options for displaying more data and customizing the formatting and layout. The Power KPI Matrix supports scorecard layouts in which many metrics can be displayed as rows or columns against a set of dimension categories such as Operational and Financial. The Dual KPI, which was featured in the Microsoft Power BI Cookbook (https://www.packtpub.com/big-data-and-business-intelligence/microsoft-power-bi-cookbook), is a good choice for displaying two closely related metrics such as the volume of customer service calls and the average waiting time for customer service calls. One significant limitation of custom KPI visuals is that data alerts cannot be configured on the dashboard tiles reflecting these visuals in the Power BI service. Data alerts are currently exclusive to the standard card, gauge, and KPI visuals. In the following Power KPI visual, Internet Net Sales is compared to Plan, and the prior year Internet Net Sales and Year-over-Year Growth percent metrics are included to support the context: The Internet Net Sales measure is formatted as a solid, green line whereas the Internet Sales Plan and Internet Net Sales (PY) measures are formatted with Dotted and Dot-dashed line styles respectively. To avoid clutter, the Y-Axis has been removed and the Label Density property of the Data labels formatting card has been set to 50 percent. This level of detail (three measures with variances) and formatting makes the Power KPI one of the richest visuals in Power BI. The Power KPI provides many options for report authors to include additional data and to customize the formatting logic and layout. Perhaps its best feature, however, is the Auto Scale property, which is enabled by default under the Layout formatting card. For example, in the following image, the Power KPI visual has been pinned to a Power BI dashboard and resized to the smallest tile size possible: As per the preceding dashboard tile, the less critical data elements such as July through August and the year-over- year % metric were removed. This auto scaling preserved space for the KPI symbol, the axis value (2017-Nov), and the actual value ($296K). With Auto Scale, a large Power KPI custom visual can be used to provide granular details in a report and then re-used in a more compact format as a tile in a Power BI dashboard. Another advantage of the Power KPI is that minimal customization of the data model is required. The following image displays the dimension column and measures of the data model mapped to the field inputs of the aforementioned Power KPI visual: The Sales and Margin Plan data is available at the monthly grain and thus the Calendar Yr-Mo column is used as the Axis input. In other scenarios, a Date column would be used for the Axis input provided that the actual and target measures both support this grain. The order of the measures used in the Values field input is interpreted by the visual as the actual value, the target value, and the secondary value. In this example, Internet Net Sales is the first or top measure in the Values field and thus is used as the actual value (for example, $296K for November). A secondary value as the third measure in the Values input (Internet Net Sales (PY)) is not required if the intent is to only display the actual value versus its target. The KPI Indicator Value and Second KPI Indicator Value fields are also optional. If left blank, the Power KPI visual will automatically calculate these two values as the percentage difference between the actual value and the target value, and the actual value and the secondary value respectively. In this example, these two calculations are already included as measures in the data model and thus applying the Internet Net Sales Var to Plan % and Internet Net Sales (YOY %) measures to these fields further clarifies how the visual is being used. If the metric being used as the actual value is truly a critical measure (for example, revenue or count of customers) to the organization or the primary user, it's almost certainly appropriate that related target and variance measures are built into the Power BI dataset. In many cases, these additional measures will be used independently in their own visuals and reports. Additionally, if a target value is not readily available, such as the preceding example with the Internet Net Sales Plan, BI teams can work with stakeholders on the proper logic to apply to a target measure, for example, 10 percent greater than the previous year. The only customization required is the KPI Indicator Index field. The result of the expression used for this field must correspond to one of five whole numbers (1-5) and thus one of the five available KPI Indicators. In the following example, the KPI Indicators KPI 1 and KPI 2 have been customized to display a green caret up icon and a red caret down icon respectively: Many different KPI Indicator symbols are available including up and down arrows, flags, stars, and exclamation marks. These different symbols can be formatted and then displayed dynamically based on the KPI Indicator Index field expression. In this example, a KPI index measure was created to return the value 1 or 2 based on the positive or negative value of the Internet Net Sales Var to Plan % measure respectively: Internet Net Sales vs Plan Index = IF([Internet Net Sales Var to Plan %] > 0,1,2) Given the positive 4.6 percent variance for November of 2017, the value 1 is returned by the index expression and the green caret up symbol for KPI 1 is displayed. With five available KPI Indicators and their associated symbols, it's possible to embed much more elaborate logic such as five index conditions (for example, poor, below average, average, above average, good) and five corresponding KPI indicators. Four different layouts (Top, Left, Bottom, and Right) are available to display the values relative to the line chart. In the preceding example, the Top layout is chosen as this results in the last value of the Axis input (2017-Nov) to be displayed in the top left corner of the visual. Like the standard line chart visual in Power BI Desktop, the line style (for example, Dotted, Solid, Dashed), color, and thickness can all be customized to help distinguish the different series. Chiclet Slicer The standard slicer visual can display the items of a source column as a list or as a dropdown. Additionally, if presented as a list, the slicer can optionally be displayed horizontally rather than vertically. The custom Chiclet Slicer, developed by Microsoft, allows report authors to take even greater control over the format of slicers to further improve the self-service experience in Power BI reports. In the following example, a Chiclet Slicer has been formatted to display calendar months horizontally as three columns: Additionally, a dark green color is defined as the Selected Color property under the Chiclets formatting card to clearly identify the current selections (May and June). The Padding and Outline Style properties, also available under the Chiclets card, are set to 1 and Square respectively, to obtain a simple and compact layout. Like the slicer controls in Microsoft Excel, Chiclet Slicers also support cross highlighting. To enable cross highlighting, specify a measure which references a fact table as the Values input field to the Chiclet Slicer. For example, with the Internet Net Sales measure set as the Values input of the Chiclet Slicer, a user selection on a bar representing a product in a separate visual would update the Chiclet Slicer to indicate the calendar months without Internet Sales for the given product. The Disabled Color property can be set to control the formatting of these unrelated items. Chiclet Slicers also support images. In the following example, one row is used to display four countries via their national flags: For this visual, the Padding and Outline Style properties under the Chiclets formatting card are set to 2 and Cut respectively. Like the Calendar Month slicer, a dark green color is configured as the Selected Color property helping to identify the country or countries selected—Canada, in this example. The Chiclet Slicer contains three input field wells—Category, Values, and Image. All three input field wells must have a value to display the images. The Category input contains the names of the items to be displayed within the Chiclets. The Image input takes a column with URL links corresponding to images for the given category values. In this example, the Sales Territory Country column is used as the Category input and the Internet Net Sales measure is used as the Values input to support cross highlighting. The Sales Territory URL column, which is set as an Image URL data category, is used as the Image input. For example, the following Sales Territory URL value is associated with the United States: http://www.crwflags.com/fotw/images/u/us.gif. A standard slicer visual can also display images when the data category of the field used is set as Image URL. However, the standard slicer is limited to only one input field and thus cannot also display a text column associated with the image. Additionally, the standard slicer lacks the richer cross-highlighting and formatting controls of the Chiclet Slicer. Impact Bubble Chart One of the limitations with standard Power BI visuals is the number of distinct measures that can be represented graphically. For example, the standard scatter chart visual is limited to three primary measures (X-AXIS, Y-AXIS, and SIZE), and a fourth measure can be used for color saturation. The Impact Bubble Chart custom visual, released in August of 2017, supports five measures by including a left and right bar input for each bubble. In the following visual, the left and right bars of the Impact Bubble Chart are used to visually indicate the distribution of AdWorks Net Sales between Online and Reseller Sales channels: The Impact Bubble Chart supports five input field wells: X-AXIS, Y-AXIS, SIZE, LEFT BAR, and RIGHT BAR. In this example, the following five measures are used for each of these fields respectively: AdWorks Net Sales, AdWorks Net Margin %, AdWorks Net Sales (YTD), Internet Net Sales, and Reseller Net Sales. The length of the left bar indicates that Australia's sales are almost exclusively derived from online sales. Likewise, the length of the right bar illustrates that Canada's sales are almost wholly obtained via Reseller Sales. These graphical insights per item would not be possible for the standard Power BI scatter chart. Specifically, the Internet Net Sales and Reseller Net Sales measures could only be added as Tooltips, thus requiring the user to hover over each individual bubble. In its current release, the Impact Bubble Chart does not support the formatting of data labels, a legend, or the axis titles. Therefore, a supporting text box can be created to advise the user of the additional measures represented. In the top right corner of this visual, a text box is set against the background to associate measures to the two bars and the size of the bubbles. Dot Plot by Maq Software Just as the Impact Bubble Chart supports additional measures, the Dot Plot by Maq Software allows for the visualization of up to four distinct dimension columns. With three Axis fields and a Legend field, a measure can be plotted to a more granular level than any other standard or custom visual currently available to Power BI. Additionally, a rich set of formatting controls are available to customize the Dot Plot's appearance, such as orientation (horizontal or vertical), and whether the Axis categories should be split or stacked. In the following visual, each bubble represents the internet sales for a specific grouping of the following dimension columns: Sales Territory Country, Product Subcategory, Promotion Type, and Customer History Segment: For example, one bubble represents the Internet Sales for the Road Bikes Product Subcategory within the United States Sales Territory Country, which is associated with the volume discount promotion type and the first year Customer History Segment. In this visual, the Customer History Segment column is used as the legend and thus the color of each bubble is automatically formatted to one of the three customer history segments. In the preceding example, the Orientation property is set to Horizontal and the Split labels property under the Axis category formatting card is enabled. The Split labels formatting causes the Sales Territory Country column to be displayed on the opposite axis of the Product Subcategory column. Disabling this property results in the two columns being displayed as a hierarchy on the same axis with the child column (Product Subcategory) positioned inside the parent column (Sales Territory Country). Despite its power in visualizing many dimension columns and its extensive formatting features, data labels are currently not supported. Therefore, when the maximum of four dimension columns are used, such as in the previous example, it's necessary to hover over the individual bubbles to determine which specific grouping the bubble represents, such as in the following example: With this, you can easily extend solutions beyond the capabilities of Power BI's standard visuals and support specific and unique, complex use-cases. If you found this tutorial useful, do check out the book Mastering Microsoft Power BI and develop visually rich, immersive, and interactive Power BI reports and dashboards. Building a Microsoft Power BI Data Model How to build a live interactive visual dashboard in Power BI with Azure Stream How to use M functions within Microsoft Power BI for querying data “Tableau is the most powerful and secure end-to-end analytics platform”: An interview with Joshua Milligan
Read more
  • 0
  • 0
  • 45892

article-image-decoding-the-reasons-behind-alphabets-record-high-earnings-in-q2-2018
Sugandha Lahoti
25 Jul 2018
7 min read
Save for later

Decoding the reasons behind Alphabet’s record high earnings in Q2 2018

Sugandha Lahoti
25 Jul 2018
7 min read
Alphabet, Google’s parent company, saw its stock price rise quickly after it announced its Q2 2018 earning results, shocking analysts (in a good way) all over the world. Shares of Alphabet have jumped more than 5% in after-hours trading Monday, hitting a new record high. Source: NASDAQ It would seem that the EU’s fine of €4.34 billion on Google for breaching EU antitrust laws had little effect on its progress in terms of Q2 earnings. According to Ruth Porat, Google's CFO, Alphabet generated revenue of $32.66 billion during Q2 2018, compared to $26.01 billion during the same quarter last year. Excluding the fine, Alphabet still booked a net income of $3.2 billion, which equals earnings of $4.54 per share. Had the EU decision gone the other way, Alphabet would have had $32.6 billion in revenue and a profit of $8.2 billion. “We want Google to be the source you think of when you run into a problem.” - Sundar Pichai, Google CEO, in the Q2 2018 Earnings Call In Monday afternoon’s earnings call, CEO Sundar Pichai focused on three major domains that have helped Alphabet achieve its Q2 earnings. First, he claimed that machine learning and AI was becoming a crucial unifying component across all of Google's products and offerings helping to cement and consolidate its position in the market. Second, Pichai suggested that investments in computing, video, cloud and advertising platforms have helped push Google into new valuable markets. And third, the company's investment in new businesses and emerging markets was proving to be a real growth driver which should secure Google's future success. Let us look at the various facets of Google’s growth strategy that have proven to be successful this quarter. Investing in AI With the world spinning around the axis of AI, Alphabet is empowering all of its product and service offerings with AI and machine learning. At its annual developer conference earlier this year, Google I/O, Google announced new updates to their products that rely on machine learning. For example, the revamped Google news app uses machine learning to provide relevant news stories for users, and improvements to Google assistant also helped the organization strengthen its position in that particular market. (By the end of 2018, it will be available in more than 30 languages in 80 countries.) This is another smart move by Alphabet in its plan to make information accessible to all while generating more revenue-generating options for themselves and expanding their partnerships to new vendors and enterprise clients. Google Translate also saw a huge bump in volume especially during the World Cup, as fans all over the world traveled to Russia to witness the football gala. Another smart decision was adding updates to Google Maps. This has achieved a 50% year-on-year growth in Indonesia, India, and Nigeria, three very big and expanding markets. Defending its Android ecosystem and business model The first Android Phone arrived in 2008. The project was built on the simple idea of a mobile platform that was free and open to everyone. Today, there are more than 24,000 Android-powered devices from over 1400 phone manufacturers. Google’s decision to build a business model that encourages this open ecosystem to thrive has been a clever strategy. It not only generates significant revenue for the company but it also brings a world of developers and businesses into its ecosystem. It's vendor lock-in with a friendly face. Of course, with the EU watching closely, Google has to be careful to follow regulation. Failure to comply could mean the company would face penalty payments of up to 5% of its average daily worldwide turnover of Alphabet. According to Brian Wieser, an analyst at Pivotal Research Group, however, “There do not appear to be any signs that should cause a meaningful slow down anytime soon, as fines from the EU are not likely to hamper Alphabet’s growth rate. Conversely, regulatory changes such as GDPR in Europe (and similar laws implemented elsewhere) could have the effect of reinforcing Alphabet’s growth.” Forming new partnerships Google has always been very keen to form new partnerships and strategic alliances with a wide variety of companies and startups. It has been very smart in systematically looking for partners that will complement their strengths and bring the end product to the market. Partnering also provides flexibility; instead of developing new solutions and tools in-house, Google can instead bring interesting innovations into the Google ecosystem simply thanks to its financial clout. For example, Google has partnered with many electronic companies to expand the number of devices compatible with Google assistant. Furthermore, its investment in computing platforms and AI has also helped the organization to generate considerable momentum in their Made by Google hardware business across Pixel, Home, Nest, and Chromecast. Interestingly, we also saw an acceleration in business adoption of Chromebooks. Chromebooks are the most cost-efficient and secure way for businesses to enable their employees to work in the cloud. The unit sales of managed Chromebooks in Q2 grew by more than 175% year-on-year. “Advertising on Youtube has always been an incredibly strong and growing source of income for its creators. Now Google is also building new ways for creators to source income such as paid channel memberships, merchandise shelves on Youtube channels, and endorsements opportunities through Famebit.”, said Pichai. Famebit is a startup they acquired in 2016 which uses data analytics to build tools to connect brands with the right creators. This acquisition proved to be quite successful as almost half of the creators that used Famebit in 2018 doubled their revenue in the first 3 months. Google has also made significant strides in developing new shopping and commerce partnerships such as with leading global retailers like Carrefour, designed to give people the power to shop wherever and however they want. Such collaborations are great for Google as it brings their shopping, ads, and cloud products under one hood. The success of Google Cloud’s vertical strategy and customer-centric approach was illustrated by key wins including Domino's Pizza, Soundcloud, and PwC moving to GCP this quarter. Target, the chain of department store retailers in the US, is also migrating key areas of it’s business to GCP. AirAsia has also expanded its relationship with Google for using ML and data analytics. This shows that the cloud business is only going to grow further. Further, Google Cloud Platform catering to clients from across very different industries and domains signals a robust way to expand their cloud empire. Supporting future customers Google is not just thinking about its current customer base but also working on specialized products to support the next wave of people which are coming online for the first time, enabled the rise in accessibility of mobile devices. They have established high-speed public WiFi in 400 train stations in India in collaboration with the Indian railways and proposed the system in Indonesia and Mexico as well. They have also announced Google AI research center in Ghana Africa to spur AI innovation with researchers and engineers from Africa. They have also expanded the Google IT support professional certificate program to more than 25 community colleges in the US. This massive uproar by Alphabet even in the midst of EU antitrust case was the most talked about news among Wall Street analysts. Most of them consider it to be buy-in terms of stocks. For the next quarter, Google wants to continue fueling its growing cloud business “We are investing for the long run.” Pichai said. They also don’t plan to dramatically alter their Android strategy and continue to give the OS for free. Pichai said, “I’m confident that we will find a way to make sure Android is available at scale to users everywhere.” A quick look at E.U.’s antitrust case against Google’s Android Is Google planning to replace Android with Project Fuchsia? Google Cloud Launches Blockchain Toolkit to help developers build apps easily
Read more
  • 0
  • 0
  • 36075

article-image-top-5-deep-learning-architectures
Amey Varangaonkar
24 Jul 2018
9 min read
Save for later

Top 5 Deep Learning Architectures

Amey Varangaonkar
24 Jul 2018
9 min read
If you are a deep learning practitioner or someone who wants to get into the world of deep learning, you might be well acquainted with neural networks already. Neural networks, inspired by biological neural networks, are pretty useful when it comes to solving complex, multi-layered computational problems. Deep learning has stood out pretty well in several high-profile research fields - including facial and speech recognition, natural language processing, machine translation, and more. In this article, we look at the top 5 popular and widely-used deep learning architectures you should know in order to advance your knowledge or deep learning research. Convolutional Neural Networks Convolutional Neural Networks, or CNNs in short, are the popular choice of neural networks for different Computer Vision tasks such as image recognition. The name ‘convolution’ is derived from a mathematical operation involving the convolution of different functions. There are 4 primary steps or stages in designing a CNN: Convolution: The input signal is received at this stage Subsampling: Inputs received from the convolution layer are smoothened to reduce the sensitivity of the filters to noise or any other variation Activation: This layer controls how the signal flows from one layer to the other, similar to the neurons in our brain Fully connected: In this stage, all the layers of the network are connected with every neuron from a preceding layer to the neurons from the subsequent layer Here is an in-depth look at the CNN Architecture and its working, as explained by the popular AI Researcher Giancarlo Zaccone. A sample CNN in action Advantages of CNN Very good for visual recognition Once a segment within a particular sector of an image is learned, the CNN can recognize that segment present anywhere else in the image Disadvantages of CNN CNN is highly dependent on the size and quality of the training data Highly susceptible to noise Recurrent Neural Networks Recurrent Neural Networks (RNNs) have been very popular in areas where the sequence in which the information is presented is crucial. As a result, they find a lot applications in real-world domains such as natural language processing, speech synthesis and machine translation. RNNs are called ‘recurrent’ mainly because a uniform task is performed for every single element of a sequence, with the output dependant on the previous computations as well. Think of these networks as having a memory, where every calculated information is captured, stored and utilized to calculate the final outcome. Over the years, quite a few varieties of RNNs have been researched and developed: Bidirectional RNN - The output in this type of RNN depends not only on the past but also the future outcomes Deep RNN - In this type of RNN, there are multiple layers present per step, allowing for a greater rate of learning and more accuracy RNNs can be used to build industry-standard chatbots that can be used to interact with customers on websites. Given a sequence of signals from an audio wave, RNNs can also be used to predict a correct sequence of phonetic segments with a given probability. Advantages of RNN Unlike a traditional neural network, an RNN shares the same parameters across all steps. This greatly reduces the number of parameters that we need to learn RNNs can be used along with CNNs to generate accurate descriptions for unlabeled images. Disadvantages of RNN RNNs find it difficult to track long-term dependencies. This is especially true in case of long sentences and paragraphs having too many words in between the noun and the verb. RNNs cannot be stacked into very deep models. This is due to the activation function used in RNN models, making the gradient decay over multiple layers. Autoencoders Autoencoders apply the principle of backpropagation in an unsupervised environment. Autoencoders, interestingly, have a close resemblance to PCA (Principal Component Analysis) except that they are more flexible. Some of the popular applications of Autoencoders is anomaly detection - for example detecting fraud in financial transactions in banks. Basically, the core task of autoencoders is to identify and determine what constitutes regular, normal data and then identify the outliers or anomalies. Autoencoders usually represent data through multiple hidden layers such that the output signal is as close to the input signal. There are 4 major types of autoencoders being used today: Vanilla autoencoder - the simplest form of autoencoders there is, i.e. a neural net with one hidden layer Multilayer autoencoder - when one hidden layer is not enough, an autoencoder can be extended to include more hidden layers Convolutional autoencoder - In this type, convolutions are used in the autoencoders instead of fully-connected layers Regularized autoencoder - this type of autoencoders use a special loss function that enables the model to have properties beyond the basic ability to copy a given input to the output. This article demonstrates training an autoencoder using H20, a popular machine learning and AI platform. A basic representation of Autoencoder Advantages of Autoencoders Autoencoders give a resultant model which is primarily based on the data rather than predefined filters Very less complexity means it’s easier to train them Disadvantages of Autoencoders Training time can be very high sometimes If the training data is not representative of the testing data, then the information that comes out of the model can be obscured and unclear Some autoencoders, especially of the variational type, cause a deterministic bias being introduced in the model Generative Adversarial Networks The basic premise of Generative Adversarial Networks (GANs) is the training of two deep learning models simultaneously. These deep learning networks basically compete with each other - one model that tries to generate new instances or examples is called as the generator. The other model that tries to classify if a particular instance originates from the training data or from the generator is called as the discriminator. GANs, a breakthrough recently in the field of deep learning,  was a concept put forth by the popular deep learning expert Ian Goodfellow in 2014. It finds large and important applications in Computer Vision, especially image generation. Read more about the structure and the functionality of the GAN from the official paper submitted by Ian Goodfellow. General architecture of GAN (Source: deeplearning4j) Advantages of GAN Per Goodfellow, GANs allow for efficient training of classifiers in a semi-supervised manner Because of the improved accuracy of the model, the generated data is almost indistinguishable from the original data GANs do not introduce any deterministic bias unlike variational autoencoders Disadvantages of GAN Generator and discriminator working efficiently is crucial to the success of GAN. The whole system fails even if one of them fails Both the generator and discriminator are separate systems and trained with different loss functions. Hence the time required to train the entire system can get quite high. Interested to know more about GANs? Here’s what you need to know about them. ResNets Ever since they gained popularity in 2015, ResNets or Deep Residual Networks have been widely adopted and used by many data scientists and AI researchers. As you already know, CNNs are highly useful when it comes to solving image classification and visual recognition problems. As these tasks become more complex, training of the neural network starts to get a lot more difficult, as additional deep layers are required to compute and enhance the accuracy of the model. Residual learning is a concept designed to tackle this very problem, and the resultant architecture is popularly known as a ResNet. A ResNet consists of a number of residual modules - where each module represents a layer. Each layer consists of a set of functions to be performed on the input. The depth of a ResNet can vary greatly - the one developed by Microsoft researchers for an image classification problem had 152 layers! A basic building block of ResNet (Source: Quora) Advantages of ResNets ResNets are more accurate and require less weights than LSTMs and RNNs in some cases They are highly modular. Hundreds and thousands of residual layers can be added to create a network and then trained. ResNets can be designed to determine how deep a particular network needs to be. Disadvantages of ResNets If the layers in a ResNet are too deep, errors can be hard to detect and cannot be propagated back quickly and correctly. At the same time, if the layers are too narrow, the learning might not be very efficient. Apart from the ones above, a few more deep learning models are being increasingly adopted and preferred by data scientists. These definitely deserve a honorable mention: LSTM: LSTMs are a special kind of Recurrent Neural Networks that include a special memory cell that can hold information for long periods of time. A set of gates is used to determine when a particular information enters the memory and when it is forgotten. SqueezeNet: One of the newer but very powerful deep learning architectures which is extremely efficient for low bandwidth platforms such as mobile. CapsNet: CapsNet, or Capsule Networks, is a recent breakthrough in the field of Deep Learning and neural network modeling. Mainly used for accurate image recognition tasks, and is an advanced variation of the CNNs. SegNet: A popular deep learning architecture especially used to solve the image segmentation problem. Seq2Seq: An upcoming deep learning architecture being increasingly used for machine translation and building efficient chatbots So there you have it! Thanks to the intense efforts in research in deep learning and AI, we now have a variety of deep learning models at our disposal to solve a variety of problems - both functional and computational. What’s even better is that we have the liberty to choose the most appropriate deep learning architecture based on the problem at hand. [box type="shadow" align="" class="" width=""]Editor’s Tip: It is very important to know the best deep learning frameworks you can use to train your models. Here are the top 10 deep learning frameworks for you.[/box] In contrast to the traditional programming approach where we tell the computer what to do, the deep learning models figure out the problem and devise the most appropriate solution on their own - however complex the problem may be. No wonder these deep learning architectures are being researched on and deployed on a large scale by the major market players such as Google, Facebook, Microsoft and many others. Packt Explains… Deep Learning in 90 seconds Behind the scenes: Deep learning evolution and core concepts Facelifting NLP with Deep Learning  
Read more
  • 0
  • 0
  • 53414

article-image-build-hadoop-clusters-using-google-cloud-platform-tutorial
Sunith Shetty
24 Jul 2018
10 min read
Save for later

Build Hadoop clusters using Google Cloud Platform [Tutorial]

Sunith Shetty
24 Jul 2018
10 min read
Cloud computing has transformed the way individuals and organizations access and manage their servers and applications on the internet. Before Cloud computing, everyone used to manage their servers and applications on their own premises or on dedicated data centers. The increase in the raw computing power of computing (CPU and GPU) of multiple-cores on a single chip and the increase in the storage space (HDD and SSD) present challenges in efficiently utilizing the available computing resources. In today's tutorial, we will learn different ways of building Hadoop cluster on the Cloud and ways to store and access data on Cloud. This article is an excerpt from a book written by Naresh Kumar and Prashant Shindgikar titled Modern Big Data Processing with Hadoop. Building Hadoop cluster in the Cloud Cloud offers a flexible and easy way to rent resources such as servers, storage, networking, and so on. The Cloud has made it very easy for consumers with the pay-as-you-go model, but much of the complexity of the Cloud is hidden from us by the providers. In order to better understand whether Hadoop is well suited to being on the Cloud, let's try to dig further and see how the Cloud is organized internally. At the core of the Cloud are the following mechanisms: A very large number of servers with a variety of hardware configurations Servers connected and made available over IP networks Large data centers to host these devices Data centers spanning geographies with evolved network and data center designs If we pay close attention, we are talking about the following: A very large number of different CPU architectures A large number of storage devices with a variety of speeds and performance Networks with varying speed and interconnectivity Let's look at a simple design of such a data center on the Cloud:We have the following devices in the preceding diagram: S1, S2: Rack switches U1-U6: Rack servers R1: Router Storage area network Network attached storage As we can see, Cloud providers have a very large number of such architectures to make them scalable and flexible. You would have rightly guessed that when the number of such servers increases and when we request a new server, the provider can allocate the server anywhere in the region. This makes it a bit challenging for compute and storage to be together but also provides elasticity. In order to address this co-location problem, some Cloud providers give the option of creating a virtual network and taking dedicated servers, and then allocating all their virtual nodes on these servers. This is somewhat closer to a data center design, but flexible enough to return resources when not needed. Let's get back to Hadoop and remind ourselves that in order to get the best from the Hadoop system, we should have the CPU power closer to the storage. This means that the physical distance between the CPU and the storage should be much less, as the BUS speeds match the processing requirements. The slower the I/O speed between the CPU and the storage (for example, iSCSI, storage area network, network attached storage, and so on) the poorer the performance we get from the Hadoop system, as the data is being fetched over the network, kept in memory, and then fed to the CPU for further processing. This is one of the important things to keep in mind when designing Hadoop systems on the Cloud. Apart from performance reasons, there are other things to consider: Scaling Hadoop Managing Hadoop Securing Hadoop Now, let's try to understand how we can take care of these in the Cloud environment. Hadoop can be installed by the following methods: Standalone Semi-distributed Fully-distributed When we want to deploy Hadoop on the Cloud, we can deploy it using the following ways: Custom shell scripts Cloud automation tools (Chef, Ansible, and so on) Apache Ambari Cloud vendor provided methods Google Cloud Dataproc Amazon EMR Microsoft HDInsight Third-party managed Hadoop Cloudera Cloud agnostic deployment Apache Whirr Google Cloud Dataproc In this section, we will learn how to use Google Cloud Dataproc to set up a single node Hadoop cluster. The steps can be broken down into the following: Getting a Google Cloud account. Activating Google Cloud Dataproc service. Creating a new Hadoop cluster. Logging in to the Hadoop cluster. Deleting the Hadoop cluster. Getting a Google Cloud account This section assumes that you already have a Google Cloud account. Activating the Google Cloud Dataproc service Once you log in to the Google Cloud console, you need to visit the Cloud Dataproc service. The activation screen looks something like this: Creating a new Hadoop cluster Once the Dataproc is enabled in the project, we can click on Create to create a new Hadoop cluster. After this, we see another screen where we need to configure the cluster parameters: I have left most of the things to their default values. Later, we can click on the Create button which creates a new cluster for us. Logging in to the cluster After the cluster has successfully been created, we will automatically be taken to the cluster lists page. From there, we can launch an SSH window to log in to the single node cluster we have created. The SSH window looks something like this: As you can see, the Hadoop command is readily available for us and we can run any of the standard Hadoop commands to interact with the system. Deleting the cluster In order to delete the cluster, click on the DELETE button and it will display a confirmation window, as shown in the following screenshot. After this, the cluster will be deleted: Looks so simple, right? Yes. Cloud providers have made it very simple for users to use the Cloud and pay only for the usage. Data access in the Cloud The Cloud has become an important destination for storing both personal data and business data. Depending upon the importance and the secrecy requirements of the data, organizations have started using the Cloud to store their vital datasets. The following diagram tries to summarize the various access patterns of typical enterprises and how they leverage the Cloud to store their data: Cloud providers offer different varieties of storage. Let's take a look at what these types are: Block storage File-based storage Encrypted storage Offline storage Block storage This type of storage is primarily useful when we want to use this along with our compute servers, and want to manage the storage via the host operating system. To understand this better, this type of storage is equivalent to the hard disk/SSD that comes with our laptops/MacBook when we purchase them. In case of laptop storage, if we decide to increase the capacity, we need to replace the existing disk with another one. When it comes to the Cloud, if we want to add more capacity, we can just purchase another larger capacity storage and attach it to our server. This is one of the reasons why the Cloud has become popular as it has made it very easy to add or shrink the storage that we need. It's good to remember that, since there are many different types of access patterns for our applications, Cloud vendors also offer block storage with varying storage/speed requirements measured with their own capacity/IOPS, and so on. Let's take an example of this capacity upgrade requirement and see what we do to utilize this block storage on the Cloud. In order to understand this, let's look at the example in this diagram: Imagine a server created by the administrator called DB1 with an original capacity of 100 GB. Later, due to unexpected demand from customers, an application started consuming all the 100 GB of storage, so the administrator has decided to increase the capacity to 1 TB (1,024 GB). This is what the workflow looks like in this scenario: Create a new 1 TB disk on the Cloud Attach the disk to the server and mount it Take a backup of the database Copy the data from the existing disk to the new disk Start the database Verify the database Destroy the data on the old disk and return the disk This process is simplified but in production this might take some time, depending upon the type of maintenance that is being performed by the administrator. But, from the Cloud perspective, acquiring new block storage is very quick. File storage Files are the basics of computing. If you are familiar with UNIX/Linux environments, you already know that, everything is a file in the Unix world. But don't get confused with that as every operating system has its own way of dealing with hardware resources. In this case we are not worried about how the operating system deals with hardware resources, but we are talking about the important documents that the users store as part of their day-to-day business. These files can be: Movie/conference recordings Pictures Excel sheets Word documents Even though they are simple-looking files in our computer, they can have significant business importance and should be dealt with in a careful fashion, when we think of storing these on the Cloud. Most Cloud providers offer an easy way to store these simple files on the Cloud and also offer flexibility in terms of security as well. A typical workflow for acquiring the storage of this form is like this: Create a new storage bucket that's uniquely identified Add private/public visibility to this bucket Add multi-geography replication requirement to the data that is stored in this bucket Some Cloud providers bill their customers based on the number of features they select as part of their bucket creation. Please choose a hard-to-discover name for buckets that contain confidential data, and also make them private. Encrypted storage This is a very important requirement for business critical data as we do not want the information to be leaked outside the scope of the organization. Cloud providers offer an encryption at rest facility for us. Some vendors choose to do this automatically and some vendors also provide flexibility in letting us choose the encryption keys and methodology for the encrypting/decrypting data that we own. Depending upon the organization policy, we should follow best practices in dealing with this on the Cloud. With the increase in the performance of storage devices, encryption does not add significant overhead while decrypting/encrypting files. This is depicted in the following image: Continuing the same example as before, when we choose to encrypt the underlying block storage of 1 TB, we can leverage the Cloud-offered encryption where they automatically encrypt and decrypt the data for us. So, we do not have to employ special software on the host operating system to do the encryption and decryption. Remember that encryption can be a feature that's available in both the block storage and file-based storage offer from the vendor. Cold storage This storage is very useful for storing important backups in the Cloud that are rarely accessed. Since we are dealing with a special type of data here, we should also be aware that the Cloud vendor might charge significantly high amounts for data access from this storage, as it's meant to be written once and forgetten (until it's needed). The advantage with this mechanism is that we have to pay lesser amounts to store even petabytes of data. We looked at the different steps involved in building our own Hadoop cluster on the Cloud. And we saw different ways of storing and accessing our data on the Cloud. To know more about how to build expert Big Data systems, do checkout this book Modern Big Data Processing with Hadoop. Read More: What makes Hadoop so revolutionary? Machine learning APIs for Google Cloud Platform Getting to know different Big data Characteristics
Read more
  • 0
  • 0
  • 29956

article-image-optical-training-of-neural-networks-is-making-ai-more-efficient
Natasha Mathur
20 Jul 2018
3 min read
Save for later

Optical training of Neural networks is making AI more efficient

Natasha Mathur
20 Jul 2018
3 min read
According to research conducted by T. W. Hughes, M. Minkov, Y. Shi, and S. Fan, artificial neural networks can be directly trained on an optical chip. The research, titled “Training of photonic neural networks through in situ backpropagation and gradient measurement” demonstrates that an optical circuit has all the capabilities to perform the critical functions of an electronics-based artificial neural network. This makes performing complex tasks like speech or image recognition less expensive, faster and more energy efficient. According to research team leader, Shanhui Fan of Stanford University "Using an optical chip to perform neural network computations more efficiently than is possible with digital computers could allow more complex problems to be solved”. During the research, the training step on optical ANNs was performed using a traditional digital computer. The final settings were then imported into the optical circuit. But, according to Optica (the Optical Society journal for high impact research at Stanford),. there is a more direct method for training these networks. This involves making use of an optical analog within the ‘backpropagation' algorithm. Tyler W. Hughes, the first author of the research paper, states that "using a physical device rather than a computer model for training makes the process more accurate”.  He also mentions that “because the training step is a very computationally expensive part of the implementation of the neural network, performing this step optically is key to improving the computational efficiency, speed and power consumption of artificial networks." Neural network processing is usually performed with the help of a traditional computer. But now, for neural network computing, researchers are interested in Optics-based devices as computations performed on these devices use much less energy compared to electronic devices. In New York researchers designed an optical chip that imitates the way, conventional computers train neural networks. This then provides a way of implementing an all-optical neural network. According to Hughes, the ANN is like a black box with a number of knobs. During the training stage, each knob is turned ever so slightly so the system can be tested to see how the algorithm’s performance changes. He says, “Our method not only helps predict which direction to turn the knobs but also how much you should turn each knob to get you closer to the desired performance”. How does the new training protocol work? This new training method uses optical circuits which have tunable beam splitters. You can adjust these spitters by altering the settings of optical phase shifters. First, you feed a laser which is encoded with information that needs to be processed through the optical circuit. Once the laser exits the device, the difference against the expected outcome is calculated. This information that is collected then generates a new light signal through the optical network in the opposite direction. Researchers also showed that neural network performance changes with respect to each beam splitter's setting. You can also change the phase shifter settings based on this information. The whole process is repeated until the desired outcome is produced by the neural network. This training technique has been further tested by researchers using optical simulations. In these tests, the optical implementation performed similarly to a conventional computer. The researchers are planning to further optimize the system in order to come out with a practical application using a neural network. How Deep Neural Networks can improve Speech Recognition and generation Recurrent neural networks and the LSTM architecture  
Read more
  • 0
  • 0
  • 18045

article-image-create-an-rnn-based-python-machine-translation-system-tutorial
Sunith Shetty
20 Jul 2018
22 min read
Save for later

Create an RNN based Python machine translation system [Tutorial]

Sunith Shetty
20 Jul 2018
22 min read
Machine translation is a process which uses neural network techniques to automatically translate text from one language to the another, with no human intervention required. In today’s machine learning tutorial, we will understand the architecture and learn how to train and build your own machine translation system. This project will help us automatically translate German to produce English sentences. This article is an excerpt from a book written by Luca Massaron, Alberto Boschetti,  Alexey Grigorev, Abhishek Thakur, and Rajalingappaa Shanmugamani titled TensorFlow Deep Learning Projects. Walkthrough of the architecture A machine translation system receives as input an arbitrary string in one language and produces, as output, a string with the same meaning but in another language. Google Translate is one example (but also many other main IT companies have their own). There, users are able to translate to and from more than 100 languages. Using the webpage is easy: on the left just put the sentence you want to translate (for example, Hello World), select its language (in the example, it's English), and select the language you want it to be translated to. Here's an example where we translate the sentence Hello World to French: Is it easy? At a glance, we may think it's a simple dictionary substitution. Words are chunked, the translation is looked up on the specific English-to-French dictionary, and each word is substituted with its translation. Unfortunately, that's not the case. In the example, the English sentence has two words, while the French one has three. More generically, think about phrasal verbs (turn up, turn off, turn on, turn down), Saxon genitive, grammatical gender, tenses, conditional sentences... they don't always have a direct translation, and the correct one should follow the context of the sentence. That's why, for doing machine translation, we need some artificial intelligence tools. Specifically, as for many other natural language processing (NLP) tasks, we'll be using recurrent neural networks (RNNs).  The main feature they have is that they work on sequences: given an input sequence, they produce an output sequence. The objective of this article is to create the correct training pipeline for having a sentence as the input sequence, and its translation as the output one. Remember also the no free lunch theorem: this process isn't easy, and more solutions can be created with the same result. Here, in this article, we will propose a simple but powerful one. First of all, we start with the corpora: it's maybe the hardest thing to find since it should contain a high fidelity translation of many sentences from a language to another one. Fortunately, NLTK, a well-known package of Python for NLP, contains the corpora Comtrans. Comtrans is the acronym of combination approach to machine translation and contains an aligned corpus for three languages: German, French, and English. In this project, we will use these corpora for a few reasons, as follows: It's easy to download and import in Python. No preprocessing is needed to read it from disk / from the internet. NLTK already handles that part. It's small enough to be used on many laptops (a few dozen thousands sentences). It's freely available on the internet. For more information about the Comtrans project, go to http://www.fask.uni-mainz.de/user/rapp/comtrans/. More specifically, we will try to create a machine translation system to translate German to English. We picked these two languages at random among the ones available in the Comtrans corpora: feel free to flip them, or use the French corpora instead. The pipeline of our project is generic enough to handle any combination. Let's now investigate how the corpora is organized by typing some commands: from nltk.corpus import comtrans print(comtrans.aligned_sents('alignment-de-en.txt')[0]) The output is as follows: <AlignedSent: 'Wiederaufnahme der S...' -> 'Resumption of the se...'> The pairs of sentences are available using the function aligned_sents. The filename contains the from and to language. In this case, as for the following part of the project, we will translate German (de) to English (en). The returned object is an instance of the class nltk.translate.api.AlignedSent. By looking at the documentation, the first language is accessible with the attribute words, while the second language is accessible with the attribute mots. So, to extract the German sentence and its English translation separately, we should run: print(comtrans.aligned_sents()[0].words) print(comtrans.aligned_sents()[0].mots) The preceding code outputs: ['Wiederaufnahme', 'der', 'Sitzungsperiode'] ['Resumption', 'of', 'the', 'session'] How nice! The sentences are already tokenized, and they look as sequences. In fact, they will be the input and (hopefully) the output of the RNN which will provide the service of machine translation from German to English for our project. Furthermore, if you want to understand the dynamics of the language, Comtrans makes available the alignment of the words in the translation: print(comtrans.aligned_sents()[0].alignment) The preceding code outputs: 0-0 1-1 1-2 2-3 The first word in German is translated to the first word in English (Wiederaufnahme to Resumption), the second to the second (der to both of and the), and the third (at index 1) is translated with the fourth (Sitzungsperiode to session). Pre-processing of the corpora The first step is to retrieve the corpora. We've already seen how to do this, but let's now formalize it in a function. To make it generic enough, let's enclose these functions in a file named corpora_tools.py. Let's do some imports that we will use later on: import pickle import re from collections import Counter from nltk.corpus import comtrans Now, let's create the function to retrieve the corpora: def retrieve_corpora(translated_sentences_l1_l2='alignment-de-en.txt'): print("Retrieving corpora: {}".format(translated_sentences_l1_l2)) als = comtrans.aligned_sents(translated_sentences_l1_l2) sentences_l1 = [sent.words for sent in als] sentences_l2 = [sent.mots for sent in als] return sentences_l1, sentences_l2 This function has one argument; the file containing the aligned sentences from the NLTK Comtrans corpora. It returns two lists of sentences (actually, they're a list of tokens), one for the source language (in our case, German), the other in the destination language (in our case, English). On a separate Python REPL, we can test this function: sen_l1, sen_l2 = retrieve_corpora() print("# A sentence in the two languages DE & EN") print("DE:", sen_l1[0]) print("EN:", sen_l2[0]) print("# Corpora length (i.e. number of sentences)") print(len(sen_l1)) assert len(sen_l1) == len(sen_l2) The preceding code creates the following output: Retrieving corpora: alignment-de-en.txt # A sentence in the two languages DE & EN DE: ['Wiederaufnahme', 'der', 'Sitzungsperiode'] EN: ['Resumption', 'of', 'the', 'session'] # Corpora length (i.e. number of sentences) 33334 We also printed the number of sentences in each corpora (33,000) and asserted that the number of sentences in the source and the destination languages is the same. In the following step, we want to clean up the tokens. Specifically, we want to tokenize punctuation and lowercase the tokens. To do so, we can create a new function in corpora_tools.py. We will use the regex module to perform the further splitting tokenization: def clean_sentence(sentence): regex_splitter = re.compile("([!?.,:;$"')( ])") clean_words = [re.split(regex_splitter, word.lower()) for word in sentence] return [w for words in clean_words for w in words if words if w] Again, in the REPL, let's test the function: clean_sen_l1 = [clean_sentence(s) for s in sen_l1] clean_sen_l2 = [clean_sentence(s) for s in sen_l2] print("# Same sentence as before, but chunked and cleaned") print("DE:", clean_sen_l1[0]) print("EN:", clean_sen_l2[0]) The preceding code outputs the same sentence as before, but chunked and cleaned: DE: ['wiederaufnahme', 'der', 'sitzungsperiode'] EN: ['resumption', 'of', 'the', 'session'] Nice! The next step for this project is filtering the sentences that are too long to be processed. Since our goal is to perform the processing on a local machine, we should limit ourselves to sentences up to N tokens. In this case, we set N=20, in order to be able to train the learner within 24 hours. If you have a powerful machine, feel free to increase that limit. To make the function generic enough, there's also a lower bound with a default value set to 0, such as an empty token set. The logic of the function is very easy: if the number of tokens for a sentence or its translation is greater than N, then the sentence (in both languages) is removed: def filter_sentence_length(sentences_l1, sentences_l2, min_len=0, max_len=20): filtered_sentences_l1 = [] filtered_sentences_l2 = [] for i in range(len(sentences_l1)): if min_len <= len(sentences_l1[i]) <= max_len and min_len <= len(sentences_l2[i]) <= max_len: filtered_sentences_l1.append(sentences_l1[i]) filtered_sentences_l2.append(sentences_l2[i]) return filtered_sentences_l1, filtered_sentences_l2 Again, let's see in the REPL how many sentences survived this filter. Remember, we started with more than 33,000: filt_clean_sen_l1, filt_clean_sen_l2 = filter_sentence_length(clean_sen_l1, clean_sen_l2) print("# Filtered Corpora length (i.e. number of sentences)") print(len(filt_clean_sen_l1)) assert len(filt_clean_sen_l1) == len(filt_clean_sen_l2) The preceding code prints the following output: # Filtered Corpora length (i.e. number of sentences) 14788 Almost 15,000 sentences survived, that is, half of the corpora. Now, we finally move from text to numbers (which AI mainly uses). To do so, we shall create a dictionary of the words for each language. The dictionary should be big enough to contain most of the words, though we can discard some if the language has words with low occourrence. This is a common practice even in the tf-idf (term frequency within a document, multiplied by the inverse of the document frequency, i.e. in how many documents that token appears), where very rare words are discarded to speed up the computation, and make the solution more scalable and generic. We need here four special symbols in both dictionaries: One symbol for padding (we'll see later why we need it) One symbol for dividing the two sentences One symbol to indicate where the sentence stops One symbol to indicate unknown words (like the very rare ones) For doing so, let's create a new file named data_utils.py containing the following lines of code: _PAD = "_PAD" _GO = "_GO" _EOS = "_EOS" _UNK = "_UNK" _START_VOCAB = [_PAD, _GO, _EOS, _UNK] PAD_ID = 0 GO_ID = 1 EOS_ID = 2 UNK_ID = 3 OP_DICT_IDS = [PAD_ID, GO_ID, EOS_ID, UNK_ID] Then, back to the corpora_tools.py file, let's add the following function: import data_utils def create_indexed_dictionary(sentences, dict_size=10000, storage_path=None): count_words = Counter() dict_words = {} opt_dict_size = len(data_utils.OP_DICT_IDS) for sen in sentences: for word in sen: count_words[word] += 1 dict_words[data_utils._PAD] = data_utils.PAD_ID dict_words[data_utils._GO] = data_utils.GO_ID dict_words[data_utils._EOS] = data_utils.EOS_ID dict_words[data_utils._UNK] = data_utils.UNK_ID for idx, item in enumerate(count_words.most_common(dict_size)): dict_words[item[0]] = idx + opt_dict_size if storage_path: pickle.dump(dict_words, open(storage_path, "wb")) return dict_words This function takes as arguments the number of entries in the dictionary and the path of where to store the dictionary. Remember, the dictionary is created while training the algorithms: during the testing phase it's loaded, and the association token/symbol should be the same one as used in the training. If the number of unique tokens is greater than the value set, only the most popular ones are selected. At the end, the dictionary contains the association between a token and its ID for each language. After building the dictionary, we should look up the tokens and substitute them with their token ID. For that, we need another function: def sentences_to_indexes(sentences, indexed_dictionary): indexed_sentences = [] not_found_counter = 0 for sent in sentences: idx_sent = [] for word in sent: try: idx_sent.append(indexed_dictionary[word]) except KeyError: idx_sent.append(data_utils.UNK_ID) not_found_counter += 1 indexed_sentences.append(idx_sent) print('[sentences_to_indexes] Did not find {} words'.format(not_found_counter)) return indexed_sentences This step is very simple; the token is substituted with its ID. If the token is not in the dictionary, the ID of the unknown token is used. Let's see in the REPL how our sentences look after these steps: dict_l1 = create_indexed_dictionary(filt_clean_sen_l1, dict_size=15000, storage_path="/tmp/l1_dict.p") dict_l2 = create_indexed_dictionary(filt_clean_sen_l2, dict_size=10000, storage_path="/tmp/l2_dict.p") idx_sentences_l1 = sentences_to_indexes(filt_clean_sen_l1, dict_l1) idx_sentences_l2 = sentences_to_indexes(filt_clean_sen_l2, dict_l2) print("# Same sentences as before, with their dictionary ID") print("DE:", list(zip(filt_clean_sen_l1[0], idx_sentences_l1[0]))) This code prints the token and its ID for both the sentences. What's used in the RNN will be just the second element of each tuple, that is, the integer ID: # Same sentences as before, with their dictionary ID DE: [('wiederaufnahme', 1616), ('der', 7), ('sitzungsperiode', 618)] EN: [('resumption', 1779), ('of', 8), ('the', 5), ('session', 549)] Please also note how frequent tokens, such as the and of in English, and der in German, have a low ID. That's because the IDs are sorted by popularity (see the body of the function create_indexed_dictionary). Even though we did the filtering to limit the maximum size of the sentences, we should create a function to extract the maximum size. For the lucky owners of very powerful machines, which didn't do any filtering, that's the moment to see how long the longest sentence in the RNN will be. That's simply the function: def extract_max_length(corpora): return max([len(sentence) for sentence in corpora]) Let's apply the following to our sentences: max_length_l1 = extract_max_length(idx_sentences_l1) max_length_l2 = extract_max_length(idx_sentences_l2) print("# Max sentence sizes:") print("DE:", max_length_l1) print("EN:", max_length_l2) As expected, the output is: # Max sentence sizes: DE: 20 EN: 20 The final preprocessing step is padding. We need all the sequences to be the same length, therefore we should pad the shorter ones. Also, we need to insert the correct tokens to instruct the RNN where the string begins and ends. Basically, this step should: Pad the input sequences, for all being 20 symbols long Pad the output sequence, to be 20 symbols long Insert an _GO at the beginning of the output sequence and an _EOS at the end to position the start and the end of the translation This is done by this function (insert it in the corpora_tools.py): def prepare_sentences(sentences_l1, sentences_l2, len_l1, len_l2): assert len(sentences_l1) == len(sentences_l2) data_set = [] for i in range(len(sentences_l1)): padding_l1 = len_l1 - len(sentences_l1[i]) pad_sentence_l1 = ([data_utils.PAD_ID]*padding_l1) + sentences_l1[i] padding_l2 = len_l2 - len(sentences_l2[i]) pad_sentence_l2 = [data_utils.GO_ID] + sentences_l2[i] + [data_utils.EOS_ID] + ([data_utils.PAD_ID] * padding_l2) data_set.append([pad_sentence_l1, pad_sentence_l2]) return data_set To test it, let's prepare the dataset and print the first sentence: data_set = prepare_sentences(idx_sentences_l1, idx_sentences_l2, max_length_l1, max_length_l2) print("# Prepared minibatch with paddings and extra stuff") print("DE:", data_set[0][0]) print("EN:", data_set[0][1]) print("# The sentence pass from X to Y tokens") print("DE:", len(idx_sentences_l1[0]), "->", len(data_set[0][0])) print("EN:", len(idx_sentences_l2[0]), "->", len(data_set[0][1])) The preceding code outputs the following: # Prepared minibatch with paddings and extra stuff DE: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1616, 7, 618] EN: [1, 1779, 8, 5, 549, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] # The sentence pass from X to Y tokens DE: 3 -> 20 EN: 4 -> 22 As you can see, the input and the output are padded with zeros to have a constant length (in the dictionary, they correspond to _PAD, see data_utils.py), and the output contains the markers 1 and 2 just before the start and the end of the sentence. As proven effective in the literature, we're going to pad the input sentences at the start and the output sentences at the end. After this operation, all the input sentences are 20 items long, and the output sentences 22. Training the machine translator So far, we've seen the steps to preprocess the corpora, but not the model used. The model is actually already available on the TensorFlow Models repository, freely downloadable from https://github.com/tensorflow/models/blob/master/tutorials/rnn/translate/seq2seq_model.py. The piece of code is licensed with Apache 2.0. We really thank the authors for having open sourced such a great model. Copyright 2015 The TensorFlow Authors. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the License); You may not use this file except in compliance with the License. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software. Distributed under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. We will see the usage of the model throughout this section. First, let's create a new file named train_translator.py and put in some imports and some constants. We will save the dictionary in the /tmp/ directory, as well as the model and its checkpoints: import time import math import sys import pickle import glob import os import tensorflow as tf from seq2seq_model import Seq2SeqModel from corpora_tools import * path_l1_dict = "/tmp/l1_dict.p" path_l2_dict = "/tmp/l2_dict.p" model_dir = "/tmp/translate " model_checkpoints = model_dir + "/translate.ckpt" Now, let's use all the tools created in the previous section within a function that, given a Boolean flag, returns the corpora. More specifically, if the argument is False, it builds the dictionary from scratch (and saves it); otherwise, it uses the dictionary available in the path: def build_dataset(use_stored_dictionary=False): sen_l1, sen_l2 = retrieve_corpora() clean_sen_l1 = [clean_sentence(s) for s in sen_l1] clean_sen_l2 = [clean_sentence(s) for s in sen_l2] filt_clean_sen_l1, filt_clean_sen_l2 = filter_sentence_length(clean_sen_l1, clean_sen_l2) if not use_stored_dictionary: dict_l1 = create_indexed_dictionary(filt_clean_sen_l1, dict_size=15000, storage_path=path_l1_dict) dict_l2 = create_indexed_dictionary(filt_clean_sen_l2, dict_size=10000, storage_path=path_l2_dict) else: dict_l1 = pickle.load(open(path_l1_dict, "rb")) dict_l2 = pickle.load(open(path_l2_dict, "rb")) dict_l1_length = len(dict_l1) dict_l2_length = len(dict_l2) idx_sentences_l1 = sentences_to_indexes(filt_clean_sen_l1, dict_l1) idx_sentences_l2 = sentences_to_indexes(filt_clean_sen_l2, dict_l2) max_length_l1 = extract_max_length(idx_sentences_l1) max_length_l2 = extract_max_length(idx_sentences_l2) data_set = prepare_sentences(idx_sentences_l1, idx_sentences_l2, max_length_l1, max_length_l2) return (filt_clean_sen_l1, filt_clean_sen_l2), data_set, (max_length_l1, max_length_l2), (dict_l1_length, dict_l2_length) This function returns the cleaned sentences, the dataset, the maximum length of the sentences, and the lengths of the dictionaries. Also, we need to have a function to clean up the model. Every time we run the training routine we need to clean up the model directory, as we haven't provided any garbage information. We can do this with a very simple function: def cleanup_checkpoints(model_dir, model_checkpoints): for f in glob.glob(model_checkpoints + "*"): os.remove(f) try: os.mkdir(model_dir) except FileExistsError: pass Finally, let's create the model in a reusable fashion: def get_seq2seq_model(session, forward_only, dict_lengths, max_sentence_lengths, model_dir): model = Seq2SeqModel( source_vocab_size=dict_lengths[0], target_vocab_size=dict_lengths[1], buckets=[max_sentence_lengths], size=256, num_layers=2, max_gradient_norm=5.0, batch_size=64, learning_rate=0.5, learning_rate_decay_factor=0.99, forward_only=forward_only, dtype=tf.float16) ckpt = tf.train.get_checkpoint_state(model_dir) if ckpt and tf.train.checkpoint_exists(ckpt.model_checkpoint_path): print("Reading model parameters from {}".format(ckpt.model_checkpoint_path)) model.saver.restore(session, ckpt.model_checkpoint_path) else: print("Created model with fresh parameters.") session.run(tf.global_variables_initializer()) return model This function calls the constructor of the model, passing the following parameters: The source vocabulary size (German, in our example) The target vocabulary size (English, in our example) The buckets (in our example is just one, since we padded all the sequences to a single size) The long short-term memory (LSTM) internal units size The number of stacked LSTM layers The maximum norm of the gradient (for gradient clipping) The mini-batch size (that is, how many observations for each training step) The learning rate The learning rate decay factor The direction of the model The type of data (in our example, we will use flat16, that is, float using 2 bytes) To make the training faster and obtain a model with good performance, we have already set the values in the code; feel free to change them and see how it performs. The final if/else in the function retrieves the model, from its checkpoint, if the model already exists. In fact, this function will be used in the decoder too to retrieve and model on the test set. Finally, we have reached the function to train the machine translator. Here it is: def train(): with tf.Session() as sess: model = get_seq2seq_model(sess, False, dict_lengths, max_sentence_lengths, model_dir) # This is the training loop. step_time, loss = 0.0, 0.0 current_step = 0 bucket = 0 steps_per_checkpoint = 100 max_steps = 20000 while current_step < max_steps: start_time = time.time() encoder_inputs, decoder_inputs, target_weights = model.get_batch([data_set], bucket) _, step_loss, _ = model.step(sess, encoder_inputs, decoder_inputs, target_weights, bucket, False) step_time += (time.time() - start_time) / steps_per_checkpoint loss += step_loss / steps_per_checkpoint current_step += 1 if current_step % steps_per_checkpoint == 0: perplexity = math.exp(float(loss)) if loss < 300 else float("inf") print ("global step {} learning rate {} step-time {} perplexity {}".format( model.global_step.eval(), model.learning_rate.eval(), step_time, perplexity)) sess.run(model.learning_rate_decay_op) model.saver.save(sess, model_checkpoints, global_step=model.global_step) step_time, loss = 0.0, 0.0 encoder_inputs, decoder_inputs, target_weights = model.get_batch([data_set], bucket) _, eval_loss, _ = model.step(sess, encoder_inputs, decoder_inputs, target_weights, bucket, True) eval_ppx = math.exp(float(eval_loss)) if eval_loss < 300 else float("inf") print(" eval: perplexity {}".format(eval_ppx)) sys.stdout.flush() The function starts by creating the model. Also, it sets some constants on the steps per checkpoints and the maximum number of steps. Specifically, in the code, we will save a model every 100 steps and we will perform no more than 20,000 steps. If it still takes too long, feel free to kill the program: every checkpoint contains a trained model, and the decoder will use the most updated one. At this point, we enter the while loop. For each step, we ask the model to get a minibatch of data (of size 64, as set previously). The method get_batch returns the inputs (that is, the source sequence), the outputs (that is, the destination sequence), and the weights of the model. With the method step, we run one step of the training. One piece of information returned is the loss for the current minibatch of data. That's all the training! To report the performance and store the model every 100 steps, we print the average perplexity of the model (the lower, the better) on the 100 previous steps, and we save the checkpoint. The perplexity is a metric connected to the uncertainty of the predictions: the more confident we're about the tokens, the lower will be the perplexity of the output sentence. Also, we reset the counters and we extract the same metric from a single minibatch of the test set (in this case, it's a random minibatch of the dataset), and performances of it are printed too. Then, the training process restarts again. As an improvement, every 100 steps we also reduce the learning rate by a factor. In this case, we multiply it by 0.99. This helps the convergence and the stability of the training. We now have to connect all the functions together. In order to create a script that can be called by the command line but is also used by other scripts to import functions, we can create a main, as follows: if __name__ == "__main__": _, data_set, max_sentence_lengths, dict_lengths = build_dataset(False) cleanup_checkpoints(model_dir, model_checkpoints) train() In the console, you can now train your machine translator system with a very simple command: $> python train_translator.py On an average laptop, without an NVIDIA GPU, it takes more than a day to reach a perplexity below 10 (12+ hours). This is the output: Retrieving corpora: alignment-de-en.txt [sentences_to_indexes] Did not find 1097 words [sentences_to_indexes] Did not find 0 words Created model with fresh parameters. global step 100 learning rate 0.5 step-time 4.3573073434829713 perplexity 526.6638556683066 eval: perplexity 159.2240770935855 [...] global step 10500 learning rate 0.180419921875 step-time 4.35106209993362414 perplexity 2.0458043055629487 eval: perplexity 1.8646006006241982 [...] In this article, we've seen how to create a machine translation system based on an RNN. We've seen how to organize the corpus, and how to train it. To know more about how to test and translate the model, do checkout this book TensorFlow Deep Learning Projects. Google’s translation tool is now offline – and more powerful than ever thanks to AI Anatomy of an automated machine learning algorithm (AutoML) FAE (Fast Adaptation Engine): iOlite’s tool to write Smart Contracts using machine translation
Read more
  • 0
  • 1
  • 20135
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-google-ai-releases-cirq-and-open-fermion-cirq-to-boost-quantum-computation
Savia Lobo
19 Jul 2018
3 min read
Save for later

Google AI releases Cirq and Open Fermion-Cirq to boost Quantum computation

Savia Lobo
19 Jul 2018
3 min read
Google AI Quantum team announced two releases at the First International Workshop on Quantum Software and Quantum Machine Learning(QSML) yesterday. Firstly the public alpha release of Cirq, an open source framework for NISQ computers. The second release is OpenFermion-Cirq, an example of a Cirq-based application enabling near-term algorithms. Noisy Intermediate Scale Quantum (NISQ) computers are devices including ~50 - 100 qubits and high fidelity quantum gates enhance the quantum algorithms such that they can understand the power that these machines uphold. However, quantum algorithms for the quantum computers have their limitations such as A poor mapping between the algorithms and the machines Also, some quantum processors have complex geometric constraints These and other nuances inevitably lead to wasted resources and faulty computations. Cirq comes as a great help for researchers here. It is focussed on near-term questions, which help researchers to understand whether NISQ quantum computers are capable of solving computational problems of practical importance. It is licensed under Apache 2 and is free to be either embedded or modified within any commercial or open source package. Cirq highlights With Cirq, researchers can write quantum algorithms for specific quantum processors. It provides a fine-tuned user control over quantum circuits by, specifying gate behavior using native gates, placing these gates appropriately on the device, and scheduling the timing of these gates within the constraints of the quantum hardware. Other features of Cirq include: Allows users to leverage the most out of NISQ architectures with optimized data structures to write and compile the quantum circuits. Supports running of the algorithms locally on a simulator Designed to easily integrate with future quantum hardware or larger simulators via the cloud. OpenFermion-Cirq highlights Google AI Quantum team also released OpenFermion-Cirq, which is an example of a CIrq-based application that enables the near-term algorithms.  OpenFermion is a platform for developing quantum algorithms for chemistry problems. OpenFermion-Cirq extends the functionality of OpenFermion by providing routines and tools for using Cirq for compiling and composing circuits for quantum simulation algorithms. An instance of the OpenFermion-Cirq is, it can be used to easily build quantum variational algorithms for simulating properties of molecules and complex materials. While building Cirq, the Google AI Quantum team worked with early testers to gain feedback and insight into algorithm design for NISQ computers. Following are some instances of Cirq work resulting from the early adopters: Zapata Computing: simulation of a quantum autoencoder (example code, video tutorial) QC Ware: QAOA implementation and integration into QC Ware’s AQUA platform (example code, video tutorial) Quantum Benchmark: integration of True-Q software tools for assessing and extending hardware capabilities (video tutorial) Heisenberg Quantum Simulations: simulating the Anderson Model Cambridge Quantum Computing: integration of proprietary quantum compiler t|ket> (video tutorial) NASA: architecture-aware compiler based on temporal-planning for QAOA (slides) and simulator of quantum computers (slides) The team also announced that it is using Cirq to create circuits that run on Google’s Bristlecone processor. Their future plans include making the Bristlecone processor available in cloud with Cirq as the interface for users to write programs for this processor. To know more about both the releases, check out the GitHub repositories of each Cirq and OpenFermion-Cirq. Q# 101: Getting to know the basics of Microsoft’s new quantum computing language Google Bristlecone: A New Quantum processor by Google’s Quantum AI lab Quantum A.I. : An intelligent mix of Quantum+A.I.
Read more
  • 0
  • 0
  • 26786

article-image-setting-up-an-ethereum-development-environment-tutorial
Packt Editorial Staff
18 Jul 2018
8 min read
Save for later

How to set up an Ethereum development environment [Tutorial]

Packt Editorial Staff
18 Jul 2018
8 min read
There are various ways to develop Ethereum blockchain. We will look at the mainstream options in this article which are: Test networks How to setup Ethereum private net This tutorial is extracted from the book Mastering Blockchain - Second Edition written by Imran Bashir. There are multiple ways to develop smart contracts on Ethereum. The usual and sensible approach is to develop and test Ethereum smart contracts either on a local private net or a simulated environment, and then it can be deployed on a public testnet. After all the relevant tests are successful on public testnet, the contracts can then be deployed to the public mainnet. There are however variations in this process, and many developers opt to only develop and test contracts on locally simulated environments. Then deploy on public mainnet or their private production blockchain networks. Developing on a simulated environment and then deploying directly to a public network can lead to faster time to production. As setting up private networks may take longer compared to setting a local development environment with a blockchain simulator. Let's start with connecting to a test network. Ethereum connection on test networks The Ethereum Go client (https://geth.ethereum.org) Geth, can be connected to the test network using the following command: $ geth --testnet A sample output is shown in the following screenshot. The screenshot shows the type of the network chosen and various other pieces of information regarding the blockchain download: The output of the geth command connecting to Ethereum test net A blockchain explorer for testnet is located at https://ropsten.etherscan.io can be used to trace transactions and blocks on the Ethereum test network. There are other test networks available too, such as Frontier, Morden, Ropsten, and Rinkeby. Geth can be issued with a command-line flag to connect to the desired network: --testnet: Ropsten network: pre-configured proof-of-work test network --rinkeby: Rinkeby network: pre-configured proof-of-authority test network --networkid value: Network identifier (integer, 1=Frontier, 2=Morden (disused), 3=Ropsten, 4=Rinkeby) (default: 1) Now let us do some experiments with building a private network and then we will see how a contract can be deployed on this network using the Mist and command-line tools. Setting up a private net Private net allows the creation of an entirely new blockchain. This is different from testnet or mainnet in the sense that it uses its on-genesis block and network ID. In order to create private net, three components are needed: Network ID The Genesis File Data directory to store blockchain data. Even though the data directory is not strictly required to be mentioned, if there is more than one blockchain already active on the system, then the data directory should be specified so that a separate directory is used for the new blockchain. On the mainnet, the Geth Ethereum client is capable of discovering boot nodes by default as they are hardcoded in the Geth client, and connects automatically. But on a private net, Geth needs to be configured by specifying appropriate flags and configuration in order for it to be discoverable by other peers or to be able to discover other peers. We will see how this is achieved shortly. In addition to the previously mentioned three components, it is desirable that you disable node discovery so that other nodes on the internet cannot discover your private network and it is secure. If other networks happen to have the same genesis file and network ID, they may connect to your private net. The chance of having the same network ID and genesis block is very low, but, nevertheless, disabling node discovery is good practice, and is recommended. In the following section, all these parameters are discussed in detail with a practical example. Network ID Network ID can be any positive number except 1 and 3, which are already in use by Ethereum mainnet and testnet (Ropsten), respectively. Network ID 786 has been chosen for the example private network discussed later in this section. The genesis file The genesis file contains the necessary fields required for a custom genesis block. This is the first block in the network and does not point to any previous block. The Ethereum protocol performs checking in order to ensure that no other node on the internet can participate in the consensus mechanism unless they have the same genesis block. Chain ID is usually used as an identification of the network. A custom genesis file that will be used later in the example is shown here: { "nonce": "0x0000000000000042", "timestamp": "0x00", "parentHash": "0x0000000000000000000000000000000000000000000000000000000000000000", "extraData": "0x00", "gasLimit": "0x8000000", "difficulty": "0x0400", "mixhash": "0x0000000000000000000000000000000000000000000000000000000000000000", "coinbase": "0x3333333333333333333333333333333333333333", "alloc": { }, "config": { "chainId": 786, "homesteadBlock": 0, "eip155Block": 0, "eip158Block": 0 } } This file is saved as a text file with the JSON extension; for example, privategenesis.json. Optionally, Ether can be pre-allocated by specifying the beneficiary's addresses and the amount of Wei, but it is usually not necessary as being on the private network, Ether can be mined very quickly. In order to pre-allocate a section can be added to the genesis file, as shown here: "alloc": { "0xcf61d213faa9acadbf0d110e1397caf20445c58f ": { "balance": "100000" }, } Now let's see what each of these parameters mean. nonce: This is a 64-bit hash used to prove that PoW has been sufficiently completed. This works in combination with the mixhash parameter. timestamp: This is the Unix timestamp of the block. This is used to verify the sequence of the blocks and for difficulty adjustment. For example, if blocks are being generated too quickly that difficulty goes higher. parentHash: This is always zero being the genesis (first) block as there is no parent of the first block. extraData: This parameter allows a 32-bit arbitrary value to be saved with the block. gasLimit: This is the limit on the expenditure of gas per block. difficulty: This parameter is used to determine the mining target. It represents the difficulty level of the hash required to prove the PoW. mixhash: This is a 256-bit hash which works in combination with nonce to prove that sufficient amount of computational resources has been spent in order to complete the PoW requirements. coinbase: This is the 160-bit address where the mining reward is sent to as a result of successful mining. alloc: This parameter contains the list of pre-allocated wallets. The long hex digit is the account to which the balance is allocated. config: This section contains various configuration information defining chain ID, and blockchain hard fork block numbers. This parameter is not required to be used in private networks. Data directory This is the directory where the blockchain data for the private Ethereum network will be saved. For example, in the following example, it is ~/etherprivate/. In the Geth client, a nu mber of parameters are specified in order to launch, further fine-tune the configuration, and launch the private network. These flags are listed here. Flags and their meaning The following are the flags used with the Geth client: --nodiscover: This flag ensures that the node is not automatically discoverable if it happens to have the same genesis file and network ID. --maxpeers: This flag is used to specify the number of peers allowed to be connected to the private net. If it is set to 0, then no one will be able to connect, which might be desirable in a few scenarios, such as private testing. --rpc: This is used to enable the RPC interface in Geth. --rpcapi: This flag takes a list of APIs to be allowed as a parameter. For example, eth, web3 will enable the Eth and Web3 interface over RPC. --rpcport: This sets up the TCP RPC port; for example: 9999. --rpccorsdomain: This flag specifies the URL that is allowed to connect to the private Geth node and perform RPC operations. cors in --rpccorsdomain means cross-origin resource sharing. --port: This specifies the TCP port that will be used to listen to the incoming connections from other peers. --identity: This flag is a string that specifies the name of a private node. Static nodes If there is a need to connect to a specific set of peers, then these nodes can be added to a file where the chaindata and keystore files are saved. For example, in the ~/etherprivate/ directory. The filename should be static- nodes.json. This is valuable in a private network because this way the nodes can be discovered on a private network. An example of the JSON file is shown as follows: [ "enode:// 44352ede5b9e792e437c1c0431c1578ce3676a87e1f588434aff1299d30325c233c8d426fc5 7a25380481c8a36fb3be2787375e932fb4885885f6452f6efa77f@xxx.xxx.xxx.xxx:TCP_P ORT" ] Here, xxx is the public IP address and TCP_PORT can be any valid and available TCP port on the system. The long hex string is the node ID. To summarize, we explored Ethereum test networks and how-to setup private Ethereum networks. Learn about cryptography and cryptocurrencies from this book Mastering Blockchain - Second Edition, to build highly secure, decentralized applications and conduct trusted in-app transactions. Everything you need to know about Ethereum Will Ethereum eclipse Bitcoin? The trouble with Smart Contracts
Read more
  • 0
  • 0
  • 19738

article-image-5-reasons-government-should-regulate-technology
Richard Gall
17 Jul 2018
6 min read
Save for later

5 reasons government should regulate technology

Richard Gall
17 Jul 2018
6 min read
Microsoft's Brad Smith took the unprecedented move last week of calling for government to regulate facial recognition technology. In an industry that has resisted government intervention, it was a bold yet humble step. It was a way of saying "we can't deal with this on our own." There will certainly be people who disagree with Brad Smith. For some the entrepreneurial spirit that is central to tech and startup culture will only be stifled by regulation. But let's be realistic about where we are at the moment - the technology industry has never faced such a crisis of confidence and met with substantial public cynicism. Perhaps government regulation is precisely what we need to move forward. Here are 4 reasons why government should regulate technology.  Regulation can restore accountability and rebuild trust in tech We've said it a lot in 2018, but there really is a significant trust deficit in technology at the moment. From Cambridge Analytica scandal to AI bias, software has been making headlines in a way it never has before. This only cultivates a culture of cynicism across the public. And with talk of automation and job losses, it paints a dark picture of the future. It's no wonder that TV series like Black Mirror have such a hold over the public imagination. Of course, when used properly, technology should simply help solve problems - whether that's better consumer tech or improved diagnoses in healthcare. The problem arises when we find that there our problem-solving innovations have unintended consequences. By regulating, government can begin to think through some of these unintended consequences. But more importantly, trust can only be rebuilt once there is some degree of accountability within the industry. Think back to Zuckerberg's Congressional hearing earlier this year - while the Facebook chief may have been sweating, the real takeaway was that his power and influence was ultimately untouchable. Whatever mistakes he's made were just part and parcel of moving fast and breaking things. An apology and a humble shrug might normally pass, but with regulation, things begin to get serious. Misusing user data? We've got a law for that. Potentially earning money from people who want to undermine western democracy? We've got a law for that. Read next: Is Facebook planning to spy on you through your mobile’s microphones? Government regulation will make the conversation around the uses and abuses of technology more public Too much conversation about how and why we build technology is happening in the wrong places. Well, not the wrong places, just not enough places. The biggest decisions about technology are largely made by some of the biggest companies on the planet. All the dreams about a new democratized and open world are all but gone, as the innovations around which we build our lives come from a handful of organizations that have both financial and cultural clout. As Brad Smith argues, tech companies like Microsoft, Google, and Amazon are not the place to be having conversations about the ethical implications of certain technologies. He argues that while it's important for private companies to take more responsibility, it's an "inadequate substitute for decision making by the public and its representatives in a democratic republic." He notes that the commercial dynamics are always going to twist conversations. Companies, after all, are answerable to shareholders - only governments are accountable to the public. By regulating, the decisions we make (or don't make) about technology immediately enter into public discourse about the kind of societies we want to live in. Citizens can be better protected by tech regulation... At present, technology often advances in spite of, not because of, people. For all the talk of human-centered design, putting the customer first, every company that builds software is interested in one thing: making money. AI in particular can be dangerous for citizens For example, according to a ProPublica investigation, AI has been used to predict future crimes in the justice system. That's frightening in itself, of course, but it's particularly terrifying when you consider that criminality was falsely predicted at twice the times for black people as white people. Even in the context of social media filters, in which machine learning serves content based on a user's behavior and profile presents dangers to citizens. It gives rise to fake news and dubious political campaigning, making citizens more vulnerable to extreme - and false - ideas. By properly regulating this technology we should immediately have more transparency over how these systems work. This transparency would not only lead to more accountability in how they are built, it also ensures that changes can be made when necessary. Read next: A quick look at E.U.’s pending antitrust case against Google’s Android ...Software engineers need protection too One group haven't really been talked about when it comes to government regulation - the people actually building the software. This a big problem. If we're talking about the ethics of AI, software engineers building software are left in a vulnerable position. This is because the lines of accountability are blurred. Without a government framework that supports ethical software decision making, engineers are left in limbo. With more support for software engineers from government, they can be more confident in challenging decisions from their employers. We need to have a debate about who's responsible for the ethics of code that's written into applications today - is it the engineer? The product manager? Or the organization itself? That isn't going to be easy to answer, but some government regulation or guidance would be a good place to begin. Regulation can bridge the gap between entrepreneurs, engineers and lawmakers Times change. Years ago, technology was deployed by lawmakers as a means of control, production or exploration. That's why the military was involved with many of the innovations of the mid-twentieth century. Today, the gap couldn't be bigger. Lawmakers barely understand encryption, let alone how algorithms work. But there is also naivety in the business world too. With a little more political nous and even critical thinking, perhaps Mark Zuckerberg could have predicted the Cambridge Analytica scandal. Maybe Elon Musk would be a little more humble in the face of a coordinated rescue mission. There's clearly a problem - on the one hand, some people don't know what's already possible. For others, it's impossible to consider that something that is possible could have unintended consequences. By regulating technology, everyone will have to get to know one another. Government will need to delve deeper into the field, and entrepreneurs and engineers will need to learn more about how regulation may affect them. To some extent, this will have to be the first thing we do - develop a shared language. It might also be the hardest thing to do, too.
Read more
  • 0
  • 0
  • 30548

article-image-openai-gym-environments-wrappers-and-monitors-tutorial
Packt Editorial Staff
17 Jul 2018
9 min read
Save for later

Extending OpenAI Gym environments with Wrappers and Monitors [Tutorial]

Packt Editorial Staff
17 Jul 2018
9 min read
In this article we are going to discuss two OpenAI Gym functionalities; Wrappers and Monitors. These functionalities are present in OpenAI to make your life easier and your codes cleaner. It provides you these convenient frameworks to extend the functionality of your existing environment in a modular way and get familiar with an agent's activity. So, let's take a quick overview of these classes. This article is an extract taken from the book, Deep Reinforcement Learning Hands-On, Second Edition written by, Maxim Lapan. What are Wrappers? Very frequently, you will want to extend the environment's functionality in some generic way. For example, an environment gives you some observations, but you want to accumulate them in some buffer and provide to the agent the N last observations, which is a common scenario for dynamic computer games, when one single frame is just not enough to get full information about the game state. Another example is when you want to be able to crop or preprocess an image's pixels to make it more convenient for the agent to digest, or if you want to normalize reward scores somehow. There are many such situations which have the same structure: you'd like to “wrap” the existing environment and add some extra logic doing something. Gym provides you with a convenient framework for these situations, called a Wrapper class. How does a wrapper work? The class structure is shown on the following diagram. The Wrapper class inherits the Env class. Its constructor accepts the only argument: the instance of the Env class to be “wrapped”. To add extra functionality, you need to redefine the methods you want to extend like step() or reset(). The only requirement is to call the original method of the superclass. Figure 1: The hierarchy of Wrapper classes in Gym. To handle more specific requirements, like a Wrapper which wants to process only observations from the environment, or only actions, there are subclasses of Wrapper which allow filtering of only a specific portion of information. They are: ObservationWrapper: You need to redefine its observation(obs) method. Argument obs is an observation from the wrapped environment, and this method should return the observation which will be given to the agent. RewardWrapper: Exposes the method reward(rew), which could modify the reward value given to the agent. ActionWrapper: You need to override the method action(act) which could tweak the action passed to the wrapped environment to the agent. Now let’s implement some wrappers To make it slightly more practical, let's imagine a situation where we want to intervene in the stream of actions sent by the agent and, with a probability of 10%, replace the current action with random one. By issuing the random actions, we make our agent explore the environment and from time to time drift away from the beaten track of its policy. This is an easy thing to do using the ActionWrapper class. import gym from typing import TypeVar import random Action = TypeVar('Action') class RandomActionWrapper(gym.ActionWrapper):     def __init__(self, env, epsilon=0.1):         super(RandomActionWrapper, self).__init__(env)         self.epsilon = epsilon Here we initialize our wrapper by calling a parent's __init__ method and saving epsilon (a probability of a random action). def action(self, action):         if random.random() < self.epsilon:             print("Random!")            return self.env.action_space.sample()        return action This is a method that we need to override from a parent's class to tweak the agent's actions. Every time we roll the die, with the probability of epsilon, we sample a random action from the action space and return it instead of the action the agent has sent to us. Please note, by using action_space and wrapper abstractions, we were able to write abstract code which will work with any environment from the Gym. Additionally, we print the message every time we replace the action, just to check that our wrapper is working. In production code, of course, this won't be necessary. if __name__ == "__main__":    env = RandomActionWrapper(gym.make("CartPole-v0")) Now it's time to apply our wrapper. We create a normal CartPole environment and pass it to our wrapper constructor. From here on we use our wrapper as a normal Env instance, instead of the original CartPole. As the Wrapper class inherits the Env class and exposes the same interface, we can nest our wrappers in any combination we want. This is a powerful, elegant and generic solution: obs = env.reset()    total_reward = 0.0    while True:        obs, reward, done, _ = env.step(0)        total_reward += reward        if done:            break    print("Reward got: %.2f" % total_reward) Here is almost the same code, except that every time we issue the same action: 0. Our agent is dull and always does the same thing. By running the code, you should see that the wrapper is indeed working: rl_book_samples/ch02$ python 03_random_actionwrapper.py WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype. Random! Random! Random! Random! Reward got: 12.00 If you want, you can play with the epsilon parameter on the wrapper's creation and check that randomness improves the agent's score on average. We should move on and look at another interesting gem hidden inside Gym: Monitor. What is a Monitor? Another class you should be aware of is Monitor. It is implemented like Wrapper and can write information about your agent's performance in a file with optional video recording of your agent in action. Some time ago, it was possible to upload the result of Monitor class' recording to the https://gym.openai.com website and see your agent's position in comparison to other people's results (see thee following screenshot), but, unfortunately, at the end of August 2017, OpenAI decided to shut down this upload functionality and froze all the results. There are several activities to implement an alternative to the original website, but they are not ready yet. I hope this situation will be resolved soon, but at the time of writing it's not possible to check your result against those of others. Just to give you an idea of how the Gym web interface looked, here is the CartPole environment leaderboard: Figure 2: OpenAI Gym web interface with CartPole submissions Every submission in the web interface had details about training dynamics. For example, below is the author's solution for one of Doom's mini-games: Figure 3: Submission dynamics on the DoomDefendLine environment. Despite this, Monitor is still useful, as you can take a look at your agent's life inside the environment. How to add Monitor to your agent So, here is how we add Monitor to our random CartPole agent, which is the only difference (the whole code is in Chapter02/04_cartpole_random_monitor.py). if __name__ == "__main__":    env = gym.make("CartPole-v0")    env = gym.wrappers.Monitor(env, "recording") The second argument we're passing to Monitor is the name of the directory it will write the results to. This directory shouldn't exist, otherwise your program will fail with an exception (to overcome this, you could either remove the existing directory or pass the force=True argument to Monitor class' constructor). The Monitor class requires the FFmpeg utility to be present on the system, which is used to convert captured observations into an output video file. This utility must be available, otherwise Monitor will raise an exception. The easiest way to install FFmpeg is by using your system's package manager, which is OS distribution-specific. To start this example, one of three extra prerequisites should be met: The code should be run in an X11 session with the OpenGL extension (GLX) The code should be started in an Xvfb virtual display You can use X11 forwarding in ssh connection The cause of this is video recording, which is done by taking screenshots of the window drawn by the environment. Some of the environment uses OpenGL to draw its picture, so the graphical mode with OpenGL needs to be present. This could be a problem for a virtual machine in the cloud, which physically doesn't have a monitor and graphical interface running. To overcome this, there is a special “virtual” graphical display, called Xvfb (X11 virtual framebuffer), which basically starts a virtual graphical display on the server and forces the program to draw inside it. That would be enough to make Monitor happily create the desired videos. To start your program in the Xvbf environment, you need to have it installed on your machine (it usually requires installing the package xvfb) and run the special script xvfb-run: $ xvfb-run -s "-screen 0 640x480x24" python 04_cartpole_random_monitor.py [2017-09-22 12:22:23,446] Making new env: CartPole-v0 [2017-09-22 12:22:23,451] Creating monitor directory recording [2017-09-22 12:22:23,570] Starting new video recorder writing to recording/openaigym.video.0.31179.video000000.mp4 Episode done in 14 steps, total reward 14.00 [2017-09-22 12:22:26,290] Finished writing results. You can upload them to the scoreboard via gym.upload('recording') As you may see from the log above, video has been written successfully, so you can peek inside one of your agent's sections by playing it. Another way to record your agent's actions is using ssh X11 forwarding, which uses ssh ability to tunnel X11 communications between the X11 client (Python code which wants to display some graphical information) and X11 server (software which knows how to display this information and has access to your physical display). In X11 architecture, the client and the server are separated and can work on different machines. To use this approach, you need the following: X11 server running on your local machine. Linux comes with X11 server as a standard component (all desktop environments are using X11). On a Windows machine you can set up third-party X11 implementations like open source VcXsrv (available in https://sourceforge.net/projects/vcxsrv/). The ability to log into your remote machine via ssh, passing –X command line option: ssh –X servername. This enables X11 tunneling and allows all processes started in this session to use your local display for graphics output. Then you can start a program which uses Monitor class and it will display the agent's actions, capturing the images into a video file. To summarize, we discussed the two extra functionalities in an OpenAI Gym; Wrappers and Monitors. To solve complex real world problems in Deep Learning, grab this practical guide Deep Reinforcement Learning Hands-On, Second Edition today. How Reinforcement Learning works How to implement Reinforcement Learning with TensorFlow Top 5 tools for reinforcement learning
Read more
  • 0
  • 0
  • 53491
article-image-the-software-behind-silicon-valley-emmy-nominated-not-hotdog-app
Sugandha Lahoti
16 Jul 2018
4 min read
Save for later

The software behind Silicon Valley’s Emmy-nominated 'Not Hotdog' app

Sugandha Lahoti
16 Jul 2018
4 min read
This is a great news for all Silicon Valley Fans. The amazing Not Hotdog A.I. app shown on season 4’s 4th episode, has been nominated for a Primetime Emmy Award. The Emmys has placed Silicon Valley and the app in the category “Outstanding Creative Achievement In Interactive Media Within a Scripted Program” among other popular shows. Other nominations include 13 Reasons Why for “Talk To The Reasons”, a website that lets you chat with the characters. Rick and Morty, for “Virtual Rick-ality”, a virtual reality game. Mr. Robot, for "Ecoin", a fictional Global Digital Currency. And Westworld for “Chaos Takes Control Interactive Experience”, an online experience for promoting the show’s second season. Within a day of its launch, the ‘Not Hotdog’ application was trending on the App Store and on Twitter, grabbing the #1 spot on both Hacker News & Product Hunt, and won a Webby for Best Use of Machine Learning. The app uses state-of-the-art deep learning, with a mix of React Native, Tensorflow & Keras. It has averaged 99.9% crash-free users with a 4.5+/5 rating on the app stores. The ‘Not Hotdog’ app does what the name suggests. It identifies hotdogs — and not hot dogs. It is available for both Android and iOS devices whose description reads “What would you say if I told you there is an app on the market that tell you if you have a hotdog or not a hotdog. It is very good and I do not want to work on it any more. You can hire someone else.” How the Not Hotdog app is built The creator Tim Anglade uses sophisticated neural architecture for the Silicon Valley A.I. app that runs directly on your phone and trained it with Tensorflow, Keras & Nvidia GPUs. Of course, the use case is not very useful, but the overall app is a substantial example of deep learning and edge computing in pop culture.  The app provides better privacy as images never leave a user’s device. Consequently, users are provided with a faster experience and offline availability as processing doesn’t go to the cloud. Using a no cloud-based AI approach means that the company can run the app at zero cost, providing significant savings, even under a load of millions of users. What is amazing about the app is that it was built by a single creator with limited resources ( a single laptop and GPU, using hand-curated data). This talks lengths of how much can be achieved even with a limited amount of time and resources, by non-technical companies, individual developers, and hobbyists alike. The initial prototype of the app was built using Google Cloud Platform’s Vision API, and React Native. React Native is a good choice as it supports many devices. The Google Cloud’s Vision API, however, was quickly abandoned. Instead, what was brought into the picture was Edge Computing.  It enabled training the neural network directly on the laptop, to be exported and embedded directly into the mobile app, making the neural network execution phase run directly inside the user’s phone. How TensorFlow powers the Not Hotdog app After React Native, the second part of their tech stack was TensorFlow. They used the TensorFlow’s Transfer Learning script, to retrain the Inception architecture which helps in dealing with a more specific image problem. Transfer Learning helped them get better results much faster, and with less data compared to training from scratch. Inception turned out too big to be retrained. So, at the suggestion of Jeremy P. Howard, they explored and settled down on SqueezeNet.  It provided explicit positioning as a solution for embedded deep learning, and the availability of a pre-trained Keras model on GitHub. The final architecture was largely based on Google’s MobileNets paper, which provided their neural architecture with Inception-like accuracy on simple problems, with only almost 4M parameters. YouTube has a $25 million plan to counter fake news and misinformation Microsoft’s Brad Smith calls for facial recognition technology to be regulated Too weird for Wall Street: Broadcom’s value drops after purchasing CA Technologies
Read more
  • 0
  • 0
  • 12406

article-image-tensorflow-1-9-is-now-generally-available
Savia Lobo
11 Jul 2018
3 min read
Save for later

Tensorflow 1.9 is now generally available

Savia Lobo
11 Jul 2018
3 min read
After the back-to-back release of Tensorflow 1.9 release candidates, rc-0, rc-1, and rc-2, the final version TensorFlow 1.9 is out and generally available. Key highlights of this version include support for gradient boosted trees estimators, new keras layers to speed up GRU and LSTM implementations and tfe.Network deprecation. It also includes improved functions for supporting data loading, text processing and pre-made estimators. Tensorflow 1.9 major features and improvements As mentioned in Tensorflow 1.9 rc-2, new Keras-based get started page and programmers guide page in the tf.Keras have been updated. The tf.Keras has been updated to Keras 2.1.6 API. One should try the newly added  tf.keras.layers.CuDNNGRU, used for a faster GRU implementation and tf.keras.layers.CuDNNLSTM layers, which allows faster LSTM implementation. Both these layers are backed by cuDNN( NVIDIA CUDA Deep Neural Network library (cuDNN)). Gradient boosted trees estimators, a non-parametric statistical learning technique for  classification and regression, are now supported by core feature columns and losses. Also, the python interface for the TFLite Optimizing Converter has been expanded, and the command line interface (AKA: toco, tflite_convert) is once again included in the standard pip installation. The distributions.Bijector API in the TF version 1.9 also supports broadcasting for Bijectors with the new API changes. Tensorflow 1.9 also includes improved data-loading and text processing with tf.decode_compressed, tf.string_strip, and Tf.strings.regex_full_match. It also has an added experimental support for new pre-made estimators like tf.contrib.estimator.BaselineEstimator, tf.contrib.estimator.RNNClassifier, tf.contrib.estimator.RNNEstimator. This version includes two breaking changes. Firstly for opening up empty variable scopes one can replace variable_scope('', ...) by variable_scope(tf.get_variable_scope(), ...), which is used to get the current scope of the variable. And the second breakthrough change is, headers used for building custom ops have been moved to a different file path. From site-packages/external to site-packages/tensorflow/include/external. Some bug fixes and other changes include: The tfe.Network has been deprecated Layered variable names have changed in the following conditions: Using tf.keras.layers with custom variable scopes. Using tf.layers in a subclassed tf.keras.Model class. Added the ability to pause recording operations for gradient computation via tf.GradientTape.stop_recording in the Eager execution and updated its documentation and introductory notebooks. Fixed an issue in which the TensorBoard Debugger Plugin, which could not handle total source file size exceeding gRPC message size limit (4 MB). Added GCS Configuration Ops and complex128 support to FFT, FFT2D, FFT3D, IFFT, IFFT2D, and IFFT3D. Conv3D, Conv3DBackpropInput, Conv3DBackpropFilter now supports arbitrary. Prevents tf.gradients() from backpropagating through integer tensors. LinearOperator[1D,2D,3D]Circulant added to tensorflow.linalg. To know more about the other changes, visit TensorFlow 1.9 release notes on GitHub. Create a TensorFlow LSTM that writes stories [Tutorial] Build and train an RNN chatbot using TensorFlow [Tutorial] Use TensorFlow and NLP to detect duplicate Quora questions [Tutorial]
Read more
  • 0
  • 0
  • 5930

article-image-tensorflow-lstm-that-writes-stories-tutorial
Packt Editorial Staff
09 Jul 2018
18 min read
Save for later

Create a TensorFlow LSTM that writes stories [Tutorial]

Packt Editorial Staff
09 Jul 2018
18 min read
LSTMs are heavily employed for tasks such as text generation and image caption generation. For example, language modeling is very useful for text summarization tasks or generating captivating textual advertisements for products, where image caption generation or image annotation is very useful for image retrieval, and where a user might need to retrieve images representing some concept (for example, a cat). In this tutorial, we will implement an LSTM which will generate new stories after training on a dataset of folk stories. This article is extracted from the book Natural Language Processing with Tensorflow by Thushan Ganegedara. The application that we will cover in this article is the use of an LSTM to generate new text. For this task, we will download translations of some folk stories by the Brothers Grimm. We will use these stories to train an LSTM and ask it at the end to output a fresh new story. We will process the text by breaking it into character level bigrams (n-grams, where n=2) and make a vocabulary out of the unique bigrams. The code for this article is available on Github. First, we will discuss the data we will use for text generation and various preprocessing steps employed to clean data. About the dataset We will understand what the dataset looks like so that when we see the generated text, we can assess whether it makes sense, given the training data. We will download the first 100 books from the website, https://www.cs.cmu.edu/~spok/grimmtmp/. These are translations of a set of books (from German to English) by the Brothers Grimm. Initially, we will download the first 100 books in the website, with an automated script, as shown here: url = 'https://www.cs.cmu.edu/~spok/grimmtmp/' # Create a directory if needed dir_name = 'stories' if not os.path.exists(dir_name):    os.mkdir(dir_name)    def maybe_download(filename):  """Download a file if not present"""  print('Downloading file: ', dir_name+ os.sep+filename) if not os.path.exists(dir_name+os.sep+filename): filename, _ = urlretrieve(url + filename, dir_name+os.sep+filename) else: print('File ',filename, ' already exists.') return filename num_files = 100 filenames = [format(i, '03d')+'.txt' for i in range(1,101)] for fn in filenames: maybe_download(fn) We will now show example text snippets extracted from two randomly picked stories. The following is the first snippet: Then she said, my dearest benjamin, your father has had these coffins made for you and for your eleven brothers, for if I bring a little girl into the world, you are all to be killed and buried in them.  And as she wept while she was saying this, the son comforted her and said, weep not, dear mother, we will save ourselves, and go hence… The second text snippet is as follows: Red-cap did not know what a wicked creature he was, and was not at all afraid of him. "Good-day, little red-cap," said he. "Thank you kindly, wolf." "Whither away so early, little red-cap?" "To my grandmother's." "What have you got in your apron?" "Cake and wine.  Yesterday was baking-day, so poor sick grandmother is to have something good, to make her stronger."… Preprocessing data In terms of preprocessing, we will initially make all the text lowercase and break the text into character n-grams, where n=2. Consider the following sentence: The king was hunting in the forest. This would break down to a sequence of n-grams, as follows: ['th,' 'e ,' 'ki,' 'ng,' ' w,' 'as,' …] We will use character level bigrams because it greatly reduces the size of the vocabulary compared with using individual words. Moreover, we will be replacing all the bigrams that appear fewer than 10 times in the corpus with a special token (that is, UNK), representing that bigram is unknown. This helps us to reduce the size of the vocabulary even further. Implementing an LSTM Though there are sublibraries in TensorFlow that have already implemented LSTMs ready to go, we will implement one from scratch. This will be very valuable, as in the real world there might be situations where you cannot use these off-the-shelf components directly. We will discuss the hyperparameters and their effects used for the LSTM. Thereafter, we will discuss the parameters (weights and biases) required to implement the LSTM. We will then discuss how these parameters are used to write the operations taking place within the LSTM. This will be followed by understanding how we will sequentially feed data to the LSTM. Next, we will discuss how we can implement the optimization of the parameters using gradient clipping. Finally, we will investigate how we can use the learned model to output predictions, which are essentially bigrams that will eventually add up to a meaningful story. Defining hyperparameters We will define some hyper-parameters required for the LSTM: # Number of neurons in the hidden state variables num_nodes = 128 # Number of data points in a batch we process batch_size = 64 # Number of time steps we unroll for during optimization num_unrollings = 50 dropout = 0.2 # We use dropout The following list describes each of the hyperparameters: num_nodes: This denotes the number of neurons in the cell memory state. When data is abundant, increasing the complexity of the cell memory will give you a better performance; however, at the same time, it slows down the computations. batch_size: This is the amount of data processed in a single step. Increasing the size of the batch gives a better performance, but poses higher memory requirements. num_unrollings: This is the number of time steps used in truncated-BPTT. The higher the num_unrollings steps, the better the performance, but it will increase both the memory requirement and the computational time. dropout: Finally, we will employ dropout (that is, a regularization technique) to reduce overfitting of the model and produce better results; dropout randomly drops information from inputs/outputs/state variables before passing them to their successive operations. This creates redundant features during learning, leading to better performance. Defining parameters Now we will define TensorFlow variables for the actual parameters of the LSTM. First, we will define the input gate parameters: ix: These are weights connecting the input to the input gate im: These are weights connecting the hidden state to the input gate ib: This is the bias Here we will define the parameters: # Input gate (it) - How much memory to write to cell state # Connects the current input to the input gate ix = tf.Variable(tf.truncated_normal([vocabulary_size, num_nodes], stddev=0.02)) # Connects the previous hidden state to the input gate im = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], stddev=0.02)) # Bias of the input gate ib = tf.Variable(tf.random_uniform([1, num_nodes],-0.02, 0.02)) Similarly, we will define such weights for the forget gate, candidate value (used for memory cell computations), and output gate. The forget gate is defined as follows: # Forget gate (ft) - How much memory to discard from cell state # Connects the current input to the forget gate fx = tf.Variable(tf.truncated_normal([vocabulary_size, num_nodes], stddev=0.02)) # Connects the previous hidden state to the forget gate fm = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], stddev=0.02)) # Bias of the forget gate fb = tf.Variable(tf.random_uniform([1, num_nodes],-0.02, 0.02)) The candidate value (used to compute the cell state) is defined as follows: # Candidate value (c~t) - Used to compute the current cell state # Connects the current input to the candidate cx = tf.Variable(tf.truncated_normal([vocabulary_size, num_nodes], stddev=0.02)) # Connects the previous hidden state to the candidate cm = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], stddev=0.02)) # Bias of the candidate cb = tf.Variable(tf.random_uniform([1, num_nodes],-0.02,0.02)) The output gate is defined as follows: # Output gate - How much memory to output from the cell state # Connects the current input to the output gate ox = tf.Variable(tf.truncated_normal([vocabulary_size, num_nodes], stddev=0.02)) # Connects the previous hidden state to the output gate om = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], stddev=0.02)) # Bias of the output gate ob = tf.Variable(tf.random_uniform([1, num_nodes],-0.02,0.02)) Next, we will define variables for the state and output. These are the TensorFlow variables representing the internal cell state and the external hidden state of the LSTM cell. When defining the LSTM computational operation, we define these to be updated with the latest cell state and hidden state values we compute, using the tf.control_dependencies(...) function. # Variables saving state across unrollings. # Hidden state saved_output = tf.Variable(tf.zeros([batch_size, num_nodes]), trainable=False, name='train_hidden') # Cell state saved_state = tf.Variable(tf.zeros([batch_size, num_nodes]), trainable=False, name='train_cell') # Same variables for validation phase saved_valid_output = tf.Variable(tf.zeros([1, num_nodes]),trainable=False, name='valid_hidden') saved_valid_state = tf.Variable(tf.zeros([1, num_nodes]),trainable=False, name='valid_cell') Finally, we will define a softmax layer to get the actual predictions out: # Softmax Classifier weights and biases. w = tf.Variable(tf.truncated_normal([num_nodes, vocabulary_size], stddev=0.02)) b = tf.Variable(tf.random_uniform([vocabulary_size],-0.02,0.02)) Note: that we're using the normal distribution with zero mean and a small standard deviation. This is fine as our model is a simple single LSTM cell. However, when the network gets deeper (that is, multiple LSTM cells stacked on top of each other), more careful initialization techniques are required. One such initialization technique is known as Xavier initialization, proposed by Glorot and Bengio in their paper Understanding the difficulty of training deep feedforward neural networks, Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, 2010. This is available as a variable initializer in TensorFlow, as shown here: https://www.tensorflow.org/api_docs/python/tf/contrib/layers/xavier_initializer. Defining an LSTM cell and its operations With the weights and the bias defined, we can now define the operations within an LSTM cell. These operations include the following: Calculating the outputs produced by the input and forget gates Calculating the internal cell state Calculating the output produced by the output gate Calculating the external hidden state The following is the implementation of our LSTM cell: def lstm_cell(i, o, state):    input_gate = tf.sigmoid(tf.matmul(i, ix) +                            tf.matmul(o, im) + ib)    forget_gate = tf.sigmoid(tf.matmul(i, fx) +                             tf.matmul(o, fm) + fb)    update = tf.matmul(i, cx) + tf.matmul(o, cm) + cb    state = forget_gate * state + input_gate * tf.tanh(update)    output_gate = tf.sigmoid(tf.matmul(i, ox) +                             tf.matmul(o, om) + ob)    return output_gate * tf.tanh(state), state Defining inputs and labels Now we will define training inputs (unrolled) and labels. The training inputs is a list with the num_unrolling batches of data (sequential), where each batch of data is of the [batch_size, vocabulary_size] size: train_inputs, train_labels = [],[] for ui in range(num_unrollings):    train_inputs.append(tf.placeholder(tf.float32,                               shape=[batch_size,vocabulary_size],                               name='train_inputs_%d'%ui))    train_labels.append(tf.placeholder(tf.float32,                               shape=[batch_size,vocabulary_size],                               name = 'train_labels_%d'%ui)) We also define placeholders for validation inputs and outputs, which will be used to compute the validation perplexity. Note that we do not use unrolling for validation-related computations. # Validation data placeholders valid_inputs = tf.placeholder(tf.float32, shape=[1,vocabulary_size],               name='valid_inputs') valid_labels = tf.placeholder(tf.float32, shape=[1,vocabulary_size],               name = 'valid_labels') Defining sequential calculations required to process sequential data Here we will calculate the outputs produced by a single unrolling of the training inputs in a recursive manner. We will also use dropout (refer to Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Srivastava, Nitish, and others, Journal of Machine Learning Research 15 (2014): 1929-1958), as this gives a slightly better performance. Finally we compute the logit values for all the hidden output values computed for the training data: # Keeps the calculated state outputs in all the unrollings # Used to calculate loss outputs = list() # These two python variables are iteratively updated # at each step of unrolling output = saved_output state = saved_state # Compute the hidden state (output) and cell state (state) # recursively for all the steps in unrolling for i in train_inputs:    output, state = lstm_cell(i, output, state)    output = tf.nn.dropout(output,keep_prob=1.0-dropout)    # Append each computed output value    outputs.append(output) # calculate the score values logits = tf.matmul(tf.concat(axis=0, values=outputs), w) + b Next, before calculating the loss, we have to make sure that the output and the external hidden state are updated to the most current value we calculated earlier. This is achieved by adding a tf.control_dependencies condition and keeping the logit and loss calculation within the condition: with tf.control_dependencies([saved_output.assign(output), saved_state.assign(state)]):    # Classifier.    loss = tf.reduce_mean(        tf.nn.softmax_cross_entropy_with_logits_v2(            logits=logits, labels=tf.concat(axis=0,                                            values=train_labels))) We also define the forward propagation logic for validation data. Note that we do not use dropout during validation, but only during training: # Validation phase related inference logic # Compute the LSTM cell output for validation data valid_output, valid_state = lstm_cell(    valid_inputs, saved_valid_output, saved_valid_state) # Compute the logits valid_logits = tf.nn.xw_plus_b(valid_output, w, b) Defining the optimizer Here we will define the optimization process. We will use a state-of-the-art optimizer known as Adam, which is one of the best stochastic gradient-based optimizers to date. Here in the code, gstep is a variable that is used to decay the learning rate over time. We will discuss the details in the next section. Furthermore, we will use gradient clipping to avoid the exploding gradient: # Decays learning rate everytime the gstep increases tf_learning_rate = tf.train.exponential_decay(0.001,gstep,                   decay_steps=1, decay_rate=0.5) # Adam Optimizer. And gradient clipping. optimizer = tf.train.AdamOptimizer(tf_learning_rate) gradients, v = zip(*optimizer.compute_gradients(loss)) gradients, _ = tf.clip_by_global_norm(gradients, 5.0) optimizer = optimizer.apply_gradients(    zip(gradients, v)) Decaying learning rate over time As mentioned earlier, I use a decaying learning rate instead of a constant learning rate. Decaying the learning rate over time is a common technique used in deep learning for achieving better performance and reducing overfitting. The key idea here is to step-down the learning rate (for example, by a factor of 0.5) if the validation perplexity does not decrease for a predefined number of epochs. Let's see how exactly this is implemented, in more detail: First we define gstep and an operation to increment gstep, called inc_gstep as follows: # learning rate decay gstep = tf.Variable(0,trainable=False,name='global_step') # Running this operation will cause the value of gstep # to increase, while in turn reducing the learning rate inc_gstep = tf.assign(gstep, gstep+1) With this defined, we can write some simple logic to call the inc_gstep operation whenever validation loss does not decrease, as follows: # Learning rate decay related # If valid perplexity does not decrease # continuously for this many epochs # decrease the learning rate decay_threshold = 5 # Keep counting perplexity increases decay_count = 0 min_perplexity = 1e10 # Learning rate decay logic def decay_learning_rate(session, v_perplexity):    global decay_threshold, decay_count, min_perplexity    # Decay learning rate    if v_perplexity < min_perplexity:        decay_count = 0        min_perplexity= v_perplexity else:   decay_count += 1    if decay_count >= decay_threshold:        print('\t Reducing learning rate')        decay_count = 0        session.run(inc_gstep) Here we update min_perplexity whenever we experience a new minimum validation perplexity. Also, v_perplexity is the current validation perplexity. Making predictions Now we can make predictions, simply by applying a softmax activation to the logits we calculated previously. We also define prediction operation for validation logits as well: train_prediction = tf.nn.softmax(logits) # Make sure that the state variables are updated # before moving on to the next iteration of generation with tf.control_dependencies([saved_valid_output.assign(valid_output),                             saved_valid_state.assign(valid_state)]):    valid_prediction = tf.nn.softmax(valid_logits) Calculating perplexity (loss) Perplexity is a measure of how surprised the LSTM is to see the next n-gram, given the current n-gram. Therefore, a higher perplexity means poor performance, whereas a lower perplexity means a better performance: train_perplexity_without_exp = tf.reduce_sum(    tf.concat(train_labels,0)*-tf.log(tf.concat(        train_prediction,0)+1e-10))/(num_unrollings*batch_size) # Compute validation perplexity valid_perplexity_without_exp = tf.reduce_sum(valid_labels*-tf. log(valid_prediction+1e-10)) Resetting states We employ state resetting, as we are processing multiple documents. So, at the beginning of processing a new document, we reset the hidden state back to zero. However, it is not very clear whether resetting the state helps or not in practice. On one hand, it sounds intuitive to reset the memory of the LSTM cell at the beginning of each document to zero, when starting to read a new story. On the other hand, this creates a bias in state variables toward zero. We encourage you to try running the algorithm both with and without state resetting and see which method performs well. # Reset train state reset_train_state = tf.group(tf.assign(saved_state,                             tf.zeros([batch_size, num_nodes])),                             tf.assign(saved_output, tf.zeros(                             [batch_size, num_nodes]))) # Reset valid state reset_valid_state = tf.group(tf.assign(saved_valid_state,                             tf.zeros([1, num_nodes])),                             tf.assign(saved_valid_output,                             tf.zeros([1, num_nodes]))) Greedy sampling to break unimodality This is quite a simple technique where we can stochastically sample the next prediction out of the n best candidates found by the LSTM. Furthermore, we will give the probability of picking one candidate to be proportional to the likelihood of that candidate being the next bigram: def sample(distribution):    best_inds = np.argsort(distribution)[-3:]    best_probs = distribution[best_inds]/    np.sum(distribution[best_inds])    best_idx = np.random.choice(best_inds,p=best_probs)    return best_idx Generating new text Finally, we will define the placeholders, variables, and operations required for generating new text. These are defined similarly to what we did for the training data. First, we will define an input placeholder and variables for state and output. Next, we will define state resetting operations. Finally, we will define the LSTM cell calculations and predictions for the new text to be generated: # Text generation: batch 1, no unrolling. test_input = tf.placeholder(tf.float32, shape=[1, vocabulary_size], name = 'test_input') # Same variables for testing phase saved_test_output = tf.Variable(tf.zeros([1,                                num_nodes]),                                trainable=False, name='test_hidden') saved_test_state = tf.Variable(tf.zeros([1,                               num_nodes]),                               trainable=False, name='test_cell') # Compute the LSTM cell output for testing data test_output, test_state = lstm_cell( test_input, saved_test_output, saved_test_state) # Make sure that the state variables are updated # before moving on to the next iteration of generation with tf.control_dependencies([saved_test_output.assign(test_output),                             saved_test_state.assign(test_state)]):    test_prediction = tf.nn.softmax(tf.nn.xw_plus_b(test_output,                                    w, b)) # Reset test state reset_test_state = tf.group(    saved_test_output.assign(tf.random_normal([1,                             num_nodes],stddev=0.05)),    saved_test_state.assign(tf.random_normal([1,                            num_nodes],stddev=0.05))) Example generated text Let's take a look at some of the data generated by the LSTM after 50 steps of learning: they saw that the birds were at her bread, and threw behind him a comb which made a great ridge with a thousand times thousands of spikes. that was a collier. the nixie was at church, and thousands of spikes, they were flowers, however, and had hewn through the glass, the children had formed a hill of mirrors, and was so slippery that it was impossible for the nixie to cross it. then she thought, i will go home quickly and fetch my axe, and cut the hill of glass in half. long before she returned, however, and had hewn through the glass, the children saw her from afar, and he sat down close to it, and was so slippery that it was impossible for the nixie to cross it. To summarize, you can see from the output, that we actually formed a story of a water-nixie in our training corpus. However, our LSTM does not merely output the text, but it adds more color to that story by introducing new things, such as talking about a church and flowers, which were not found in the original text. To write modern natural language processing applications using deep learning algorithms and TensorFlow, you may refer to this book Natural Language Processing with TensorFlow. What is LSTM? Implement Long-short Term Memory (LSTM) with TensorFlow Recurrent Neural Network and the LSTM Architecture
Read more
  • 0
  • 1
  • 10030
article-image-reinforcement-learning-mdp-markov-decision-process-tutorial
Fatema Patrawala
09 Jul 2018
11 min read
Save for later

Implement Reinforcement learning using Markov Decision Process [Tutorial]

Fatema Patrawala
09 Jul 2018
11 min read
The Markov decision process, better known as MDP, is an approach in reinforcement learning to take decisions in a gridworld environment. A gridworld environment consists of states in the form of grids. The MDP tries to capture a world in the form of a grid by dividing it into states, actions, models/transition models, and rewards. The solution to an MDP is called a policy and the objective is to find the optimal policy for that MDP task. Thus, any reinforcement learning task composed of a set of states, actions, and rewards that follows the Markov property would be considered an MDP. In this tutorial, we will dig deep into MDPs, states, actions, rewards, policies, and how to solve them using Bellman equations. This article is a reinforcement learning tutorial taken from the book, Reinforcement learning with TensorFlow. Markov decision processes MDP is defined as the collection of the following: States: S Actions: A(s), A Transition model: T(s,a,s') ~ P(s'|s,a) Rewards: R(s), R(s,a), R(s,a,s') Policy:  is the optimal policy In the case of an MDP, the environment is fully observable, that is, whatever observation the agent makes at any point in time is enough to make an optimal decision. In case of a partially observable environment, the agent needs a memory to store the past observations to make the best possible decisions. Let's try to break this into different lego blocks to understand what this overall process means. The Markov property In short, as per the Markov property, in order to know the information of near future (say, at time t+1) the present information at time t matters. Given a sequence, , the first order of Markov says, , that is,  depends only on . Therefore,  will depend only on . The second order of Markov says, , that is,  depends only on  and  In our context, we will follow the first order of the Markov property from now on. Therefore, we can convert any process to a Markov property if the probability of the new state, say , depends only on the current state, , such that the current state captures and remembers the property and knowledge from the past. Thus, as per the Markov property, the world (that is, the environment) is considered to be stationary, that is, the rules in the world are fixed. The S state set The S state set is a set of different states, represented as s, which constitute the environment. States are the feature representation of the data obtained from the environment. Thus, any input from the agent's sensors can play an important role in state formation. State spaces can be either discrete or continuous. The starts from start state and has to reach the goal state in the most optimized path without ending up in bad states (like the red colored state shown in the diagram below). Consider the following gridworld as having 12 discrete states, where the green-colored grid is the goal state, red is the state to avoid, and black is a wall that you'll bounce back from if you hit it head on: The states can be represented as 1, 2,....., 12 or by coordinates, (1,1),(1,2),.....(3,4). Actions The actions are the things an agent can perform or execute in a particular state. In other words, actions are sets of things an agent is allowed to do in the given environment. Like states, actions can also be either discrete or continuous. Consider the following gridworld example having 12 discrete states and 4 discrete actions (UP, DOWN, RIGHT, and LEFT): The preceding example shows the action space to be a discrete set space, that is, a  A where, A = {UP, DOWN, RIGHT, and LEFT}. It can also be treated as a function of state, that is, a = A(s), where depending on the state function, it decides which action is possible. Transition model The transition model T(s, a, s') is a function of three variables, which are the current state (s), action (a), and the new state (s'), and defines the rules to play the game in the environment. It gives probability P(s'|s, a), that is, the probability of landing up in the new s' state given that the agent takes an action, a, in given state, s. The transition model plays the crucial role in a stochastic world, unlike the case of a deterministic world where the probability for any landing state other than the determined one will have zero probability. Let's consider the following environment (world) and consider different cases, determined and stochastic: Since the actions a  A where, A = {UP, DOWN, RIGHT, and LEFT}. The behavior of these two cases depends on certain factors: Determined environment: In a determined environment, if you take a certain action, say UP, you will certainly perform that action with probability 1. Stochastic environment: In a stochastic environment, if you take the same action, say UP, there will certain probability say 0.8 to actually perform the given action and there is 0.1 probability it can perform an action (either LEFT or RIGHT) perpendicular to the given action, UP. Here, for the s state and the UP action transition model, T(s',UP, s) = P(s'| s,UP) = 0.8. Since T(s,a,s') ~ P(s'|s,a), where the probability of new state depends on the current state and action only, and none of the past states. Thus, the transition model follows the first order Markov property. We can also say that our universe is also a stochastic environment, since the universe is composed of atoms that are in different states defined by position and velocity. Actions performed by each atom change their states and cause changes in the universe. Rewards The reward of the state quantifies the usefulness of entering into a state. There are three different forms to represent the reward namely, R(s), R(s, a) and R(s, a, s'), but they are all equivalent. For a particular environment, the domain knowledge plays an important role in the assignment of rewards for different states as minor changes in the reward do matter for finding the optimal solution to an MDP problem. There are two approaches we reward our agent for when taking a certain action. They are: Credit assignment problem: We look at the past and check which actions led to the present reward, that is, which action gets the credit Delayed rewards: In contrast, in the present state, we check which action to take that will lead us to potential rewards Delayed rewards form the idea of foresight planning. Therefore, this concept is being used to calculate the expected reward for different states. We will discuss this in the later sections. Policy Until now, we have covered the blocks that create an MDP problem, that is, states, actions, transition models, and rewards, now comes the solution. The policy is the solution to an MDP problem. The policy is a function that takes the state as an input and outputs the action to be taken. Therefore, the policy is a command that the agent has to obey.  is called the optimal policy, which maximizes the expected reward. Among all the policies taken, the optimal policy is the one that optimizes to maximize the amount of reward received or expected to receive over a lifetime. For an MDP, there's no end of the lifetime and you have to decide the end time. Thus, the policy is nothing but a guide telling which action to take for a given state. It is not a plan but uncovers the underlying plan of the environment by returning the actions to take for each state. The Bellman equations Since the optimal  policy is the policy that maximizes the expected rewards, therefore, , where  means the expected value of the rewards obtained from the sequence of states agent observes if it follows the  policy. Thus,  outputs the  policy that has the highest expected reward. Similarly, we can also calculate the utility of the policy of a state, that is, if we are at the s state, given a  policy, then, the utility of the  policy for the s state, that is,  would be the expected rewards from that state onward: The immediate reward of the state, that is,  is different than the utility of the  state (that is, the utility of the optimal policy of the  state) because of the concept of delayed rewards. From now onward, the utility of the  state will refer to the utility of the optimal policy of the state, that is, the  state. Moreover, the optimal policy can also be regarded as the policy that maximizes the expected utility. Therefore, where, T(s,a,s') is the transition probability, that is, P(s'|s,a) and U(s') is the utility of the new landing state after the a action is taken on the s state.  refers to the summation of all possible new state outcomes for a particular action taken, then whichever action gives the maximum value of  that is considered to be the part of the optimal policy and thereby, the utility of the 's' state is given by the following Bellman equation, where,  is the immediate reward and  is the reward from future, that is, the discounted utilities of the 's' state where the agent can reach from the given s state if the action, a, is taken. Solving the Bellman equation to find policies Say we have some n states in the given environment and if we see the Bellman equation, we find out that n states are given; therefore, we will have n equations and n unknown but the  function makes it non-linear. Thus, we cannot solve them as linear equations. Therefore, in order to solve: Start with an arbitrary utility Update the utilities based on the neighborhood until convergence, that is, update the utility of the state using the Bellman equation based on the utilities of the landing states from the given state Iterate this multiple times to lead to the true value of the states. This process of iterating to convergence towards the true value of the state is called value iteration. For the terminal states where the game ends, the utility of those terminal state equals the immediate reward the agent receives while entering the terminal state. Let's try to understand this by implementing an example. An example of value iteration using the Bellman equation Consider the following environment and the given information: Given information: A, C, and X are the names of some states. The green-colored state is the goal state, G, with a reward of +1. The red-colored state is the bad state, B, with a reward of -1, try to prevent your agent from entering this state Thus, the green and red states are the terminal states, enter either and the game is over. If the agent encounters the green state, that is, the goal state, the agent wins, while if they enter the red state, then the agent loses the game. ,  (that is, reward for all states except the G and B states is -0.04),  (that is, the utility at the first time step is 0, except the G and B states). Transition probability T(s,a,s') equals 0.8 if going in the desired direction; otherwise, 0.1 each if going perpendicular to the desired direction. For example, if the action is UP then with 0.8 probability, the agent goes UP but with 0.1 probability it goes RIGHT and 0.1 to the LEFT. Questions:  Find , the utility of the X state at time step 1, that is, the agent will go through one iteration Similarly, find  Solution: R(X) = -0.04 Action as'RIGHT G 0.8+10.8 x 1 = 0.8RIGHTC0.100.1 x 0 = 0RIGHTX0.100.1 x 0 = 0 Thus, for action a = RIGHT, Action as'DOWN C 0.800.8 x 0 = 0DOWNG0.1+10.1 x 1 = 0.1DOWNA0.100.1 x 0 = 0 Thus, for action a = DOWN, Action as'UP X 0.800.8 x 0 = 0UPG0.1+10.1 x 1 = 0.1UPA0.100.1 x 0 = 0 Thus, for action a = UP, Action as'LEFT A 0.800.8 x 0 = 0LEFTX0.100.1 x 0 = 0LEFTC0.100.1 x 0 = 0 Thus, for action a = LEFT, Therefore, among all actions, Therefore, , where  and  Similarly, calculate  and  and we get  and  Since, , and, R(X) = -0.04 Action as'RIGHT G 0.8+10.8 x 1 = 0.8RIGHTC0.1-0.040.1 x -0.04 = -0.004RIGHTX0.10.360.1 x 0.36 = 0.036 Thus, for action a = RIGHT, Action as'DOWN C 0.8-0.040.8 x -0.04 = -0.032DOWNG0.1+10.1 x 1 = 0.1DOWNA0.1-0.040.1 x -0.04 = -0.004 Thus, for action a = DOWN, Action as'UP X 0.80.360.8 x 0.36 = 0.288UPG0.1+10.1 x 1 = 0.1UPA0.1-0.040.1 x -0.04 = -0.004 Thus, for action a = UP, Action as'LEFT A 0.8-0.040.8 x -0.04 = -0.032LEFTX0.10.360.1 x 0.36 = 0.036LEFTC0.1-0.040.1 x -0.04 = -0.004 Thus, for action a = LEFT, Therefore, among all actions, Therefore, , where  and  Therefore, the answers to the preceding questions are:     Policy iteration The process of obtaining optimal utility by iterating over the policy and updating the policy itself instead of value until the policy converges to the optimum is called policy iteration. The process of policy iteration is as follows: Start with a random policy,  For the given  policy at iteration step t, calculate  by using the following formula: Improve the  policy by This ends an interesting reinforcement learning tutorial. Want to implement state-of-the-art Reinforcement Learning algorithms from scratch? Get this best-selling title, Reinforcement Learning with TensorFlow. How Reinforcement Learning works Convolutional Neural Networks with Reinforcement Learning Getting started with Q-learning using TensorFlow
Read more
  • 0
  • 0
  • 25990

article-image-optimize-hbase-for-the-cloud-tutorial
Natasha Mathur
06 Jul 2018
8 min read
Save for later

How to optimize Hbase for the Cloud [Tutorial]

Natasha Mathur
06 Jul 2018
8 min read
Hadoop/Hbase was designed to crunch a huge amount of data in a batch mode and provide meaningful results to this data. This article is an excerpt taken from the book ‘HBase High Performance Cookbook’ written by Ruchir Choudhry. This book provides a solid understanding of the HBase basics. However, as the technology evolved over the years, the original architecture was fine tuned to move from the world of big-iron to the choice of cloud Infrastructure: It provides optimum pricing for the provisioning of new hardware, storage, and monitoring the infrastructure. One-click setup of additional nodes and storage. Elastic load-balancing to different clusters within the Hbase ecosystem. Ability to resize the cluster on-demand. Share capacity with different time-zones, for example, doing batch jobs in different data centers to and real-time analytics near to the customer. Easy integration with other Cloud-based services. HBase on Amazon EMR provides the ability to back up your HBase data directly to Amazon Simple Storage Service (Amazon S3). You can also restore from a previously created backup when launching an HBase cluster. Configuring Hbase for the Cloud Before we start, let's take a quick look at the supported versions and the prerequisites you need to move ahead. The list of supported versions is as below: Hbase Version - 0.94.18 & 0.94 AMI Versions - 3.1.0 and later & 3.0-3.04 AWS CLI configuration parameters -  -ami-version 3.1 -ami-version 3.2 -ami-version 3.3 --applications Name=Hbase --ami-version 3.0 --application name=Hbase --ami-version 2.2 or later --applications Name=HBase Hbase Version details -  Bug fixes Now let's look at the prerequisites: At least two instances (Optional): The cluster's master node runs the HBase master server and Zookeeper, and slave nodes run the HBase region servers. For optimum performance and production systems, HBase clusters should run on at least two EC2 instances, but you can run HBase on a single node for evaluation Purposes. Long-running clusters: HBase only runs on long-running clusters. By default, the CLI and Amazon EMR console create long-running clusters. An Amazon EC2 key pair set (Recommended): To use the Secure Shell (SSH) network protocol to connect with the master node and run HBase shell commands, you must use an Amazon EC2 key pair when you create the cluster. The correct AMI and Hadoop versions: HBase clusters are currently supported only on Hadoop 20.205 or later. The AWS CLI: This is needed to interact with Hbase using the command-line options. Use of Ganglia tool: For monitoring, it's advisable to use the Ganglia tool; this provides all information related to performance and can be installed as a client lib when we create the cluster. The logs for Hbase: They are available on the master node; it's a standard practice in a production environment to copy these logs to the Amazon S3 cluster. How to do it Open a browser and copy the following URL: (https://console.aws.amazon.com/elasticmapreduce/); if you don't have an Amazon AWS account, then you have to create it. Then choose Create cluster as shown in the following: Provide the cluster name; you must select Launch mode as cluster Let's proceed to the software configuration section. There are two options: Amazon template or MapR template. We are going to use Amazon template. It will load the default applications, which includes Hbase. Security is key when you are using ssh to the login to the cluster. Let's create a security key, by selecting NETWORK & SECURITY on the left section of the panel (as shown in the following). We have created as Hbase03: Once you create this security key, it will ask for a download of a .pem file , which is known as hbase03.pem. Copy this file to the user location and change the access to: chmod 400 <hbase03.pem> This will ensure the write level of access is there on the file and is not accessible Two-way. Now select this pair from the drop-down box in the EC2 Key pair; this will allow you to register the instance to the key while provisioning the instance. You can do this later too, but I had some challenges in doing this so it is always better to provision the instance with the property Now, you are ready to provision the EMR cluster. Go ahead and provision the cluster. It will take around 10 to 20 mins to have a cluster fully accessible and in a running Condition. Verify it by observing the console: How it works When you select the cluster name, it maps to your associate account internally and keeps the mapping the cluster is alive (or net destroyed). When you select an installation using the setting it loads all the respective JAR files, which allows it to perform in a fully distributed environment. You can select the EMR or MapR stable release, which allows us to load the compatible library, and hence focus on the solution rather than troubleshooting integration issues within the Hadoop/Hbase farms. Internally, all the slaves connects to the master, and hence, we considered an extra-large VM. Connecting to an Hbase cluster using the command line How to do it You can alternatively SSH to the node and see the details as follows: Once you have connected to the cluster, you can perform all the tasks which you can perform on local clusters. The preceding screenshot gives the details of the components we selected while installing the cluster. Let's connect to the Hbase shell to make sure all the components are connecting internally and we are able to create a sample table. How it works The communication between your machine and the Hbase cluster works by passing a key every time a command is executed; this allows the communication to be private. The shell becomes the remote shell that connects to the Hbase master via a private connection. All the base shell commands such as put, create, and scan get all the known Hbase commands. Backing up and restoring Hbase Amazon Elastic MapReduce provides multiple ways to back up and restore Hbase data to S3 cloud. It also allows us to do an incremental backup; during the backup process Hbase continues to execute the write commands, helping us to keep working while the backup process continues. There is a risk of having an inconsistency in the data. If consistency is of prime importance, then the write needs to be stopped during the initial backup process, synchronized across nodes. This can be achieved by passing the–consistent parameter when requesting a backup. When you back up HBase data, you should specify a different backup directory for each Cluster. An easy way to do this is to use the cluster identifier as part of the path specified for the backup directory. For example, s3://mybucket/backups/j-3AEXXXXXX16F2. This ensures that any future incremental backups reference the correct HBase cluster. How to do it When you are ready to delete old backup files that are no longer needed, we recommend that you first do a full backup of your HBase data. This ensures that all data is preserved and provides a baseline for future incremental backups. Once the full backup is done, you can navigate to the backup location and manually delete the old backup files: While creating a cluster, add an additional step scheduling regular backups, as shown in the following. You have to specify the location of the backup to which a backup file with the data will be kept, based on the backup frequency selected. For highly valuable data, you can have a backup on an hourly basis. For less sensitive data, it can be planned daily: It's a good practice to backup to a separate location in Amazon S3 to ensure that incremental backups are calculated correctly. It's important to specify the exact time from when the backup will be started, the time zone specified is UTC for our cluster. We can proceed with creating the cluster as planned; it will create a backup of the data to the location specified. You have to provide the exact location of the backup file and restore it. The version that is backed up needs to be specified and saved, which will allow the data to be restored. How it works During the backup process, Hbase continues to execute write commands; this ensures the cluster remains available throughout the backup process. Internally, the operation is done in parallel, thus there is a chance of it being inconsistent. If the use case requires consistency, then we have to pause the write to Hbase. This can be achieved by passing the consistent parameter while requesting a backup. This internally queues the writes and executes them as soon as the synchronization complete. We learned about configuration of Hbase for the cloud, connected Hbase cluster using the command line, and performed backup & restore of Hbase. If you found this post useful, do check out the book ‘HBase High Perforamnce Cookbook’ to learn other concepts such as terminating an HBase Cluster, accessing HBase data with hive, viewing HBase log files, etc. Understanding the HBase Ecosystem Configuring HBase 5 Mistake Developers make when working with HBase    
Read more
  • 0
  • 0
  • 15197
Modal Close icon
Modal Close icon