Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

Tech Guides - Big Data

50 Articles
article-image-hyperledger-blockchain
Savia Lobo
26 Oct 2017
6 min read
Save for later

Hyperledger: The Enterprise-ready Blockchain

Savia Lobo
26 Oct 2017
6 min read
As one of the most widely discussed phenomena across the global media, Blockchain has certainly grown from just a hype to becoming a mainstream reality. Leading industry experts from finance, supply chain, and IoT are collaborating to make Blockchain available for commercial adoption. But while Blockchain is being projected as the future of digital transactions, it still suffers from two major limitations: carrying out private transactions and scalability. As such, a pressing need to develop a Blockchain-based distributed ledger to overcome these problems was widely felt. Enter Hyperledger Founded by Linux in 2015, Hyperledger aims at providing enterprises a platform to build robust blockchain applications for their businesses and to create open-source enterprise-grade frameworks to carry out secure business transactions. It is a fulcrum, which includes leading industries and software developers working collaboratively for building blockchain frameworks that can further be used to deploy blockchain applications for industries. With leading industry experts such as IBM, Intel, Accenture, SAP, among others collaborating with the Hyperledger community, and with the recent addition of BTS, Oracle, and Patientory Foundation, the community is gaining a lot of traction. No wonder, Brian Behlendorf, Executive Director at Hyperledger, says, “Growth and interest in Hyperledger remain high in 2017”. There are a total of 8 projects: five are frameworks (Sawtooth, Fabric, Burrow, Iroha, and Indy), and the other three are tools (Composer, Cello, and Explorer) supporting those frameworks. Each framework provides a different approach in building desired blockchain applications. Hyperledger Fabric, the community’s first framework, is contributed by IBM. It hosts smart contracts using Chaincode, an interface written in Go or Java, which contains the business logic of the ledger. Hyperledger Sawtooth, developed by Intel offers a modular blockchain architecture. It consists of Proof of Elapsed Time (PoET), a consensus algorithm developed by Intel for high efficiency among distributed ledgers. Hyperledger Burrow, a joint proposal by Intel and Monax, is a permissioned smart contract machine. It executes the smart contract code following the Ethereum specification with an engine, a strong audit trail, and a consensus mechanism. Apart from these already launched frameworks, two more - namely Indy and Iroha, are still in the incubation phase. The Hyperledger community is also building supporting tools such as  Composer which is already launched in the market and Cello and Explorer which are awaiting unveiling. [box type="shadow" align="" class="" width=""]Although a plethora of Hyperledger tools and frameworks are available, in the rest of the article we take Hyperledger Fabric - one of the most popular and trending frameworks - for the purpose of demonstrating how Hyperledger is being used by businesses.[/box] Why should businesses use Hyperledger? In order to lock down a framework upon which Blockchain apps can be built, several key aspects are worth considering. Some of the most important ones among them are portability, security, reliability, interoperability, and user-friendliness. Hyperledger as a platform offers all of the above features for building cross-platform and production-ready applications for businesses. Let’s take a simple example here to see how Hyperledger works for businesses. Consider a restaurant business. A restaurant owner buys vegetables from a wholesale shop at a much lower cost than in the market. The shopkeeper creates a network wherein other buyers cannot see the cost at which vegetables are sold to a buyer. Similarly, the restaurant owner can view only his transaction with the shopkeeper. For the vegetables to reach the restaurant, they must pass through numerous stages such as transport, delivery, and so on. The restaurant owner can track the delivery of his vegetables at each stage and so can the shopkeeper. The transport and the delivery organizations, however, won’t be able to see the transaction details. This means that the shopkeeper can establish a confidential network within a private network of other stakeholders. This type of a network can be set up using Hyperledger Fabric. Let’s break down the above example into some of the reasons to consider incorporating Hyperledger for your business networks: With Hyperledger you get performance, scalability, and multiple levels of trust. You get data on a need-to-know basis - Only the parties in the network that need the data get to know about it. Backed by bigshots like Intel and IBM, Hyperledger strives to offer a strong standard for Blockchain code which in turn provides better functionality at increased speeds. Furthermore, with the recent release of Fabric v1.0, businesses can create out-of-the-box blockchain solutions on its highly elastic and extensible architecture further eased by using Hyperledger Composer. The Composer aids businesses in creating smart contracts and blockchain applications without having to know the underlying complex intricacies of the blockchain network. It is a great fit for real-world enterprise usage, built with collaborative efforts from leading industry experts. Although Ethereum is used by many businesses, some of the reasons why Hyperledger could be a better enterprise fit are: While Ethereum is a public Blockchain, Hyperledger is a private blockchain. This means enterprises within the network know who is present on the peer nodes, unlike Ethereum. Hyperledger is a permissioned network i.e., it has the ability to grant permission on who can participate in the consensus mechanism of the Blockchain network. Ethereum, on the other hand, is permissionless. Hyperledger has no built-in cryptocurrency. Ethereum, on the other hand, has a built-in cryptocurrency, called Ether. Many applications don’t need a cryptocurrency to function, and using Ethereum there can be a disadvantage. Hyperledger gives you the flexibility of choosing a programming language such as Java or Go, for preparing smart contracts. Ethereum, on the other hand, uses Solidity which is a lot less common in use. Hyperledger is highly scalable — unlike traditional Blockchain and Ethereum — with minimal performance losses. “Since Hyperledger Fabric was designed to meet key requirements for permissioned blockchains with transaction privacy and configurable policies, we’ve been able to build solutions quickly and flexibly. ” - Mohan Venkataraman, CTO, IT People Corporation. Future of Hyperledger The Hyperledger community is expanding rapidly with many industries collaborating and offering their capabilities in building cross-industry blockchain applications. Hyperledger has found adoption within business networks in varied industries such as healthcare, finance, and supply chain to build state-of-the-art blockchain applications which assure privacy and decentralized permissioned networks. It is shaping up to be a technology which can revolutionize the way businesses deal with different access control within a consortium, with an armor of enhanced security measures. With the continuous developments in these frameworks, smarter, faster, and more secure business transactions will soon be a reality. Besides, we can expect to see Hyperledger on the cloud with IBM’s plans to extend Blockchain technologies onto its cloud. Add to that the exciting prospect of blending aspects of Artificial Intelligence with Hyperledger, transactions look more advanced, tamper-proof, and secure than ever before.
Read more
  • 0
  • 0
  • 26817

article-image-blockchain-iot-security
Savia Lobo
29 Aug 2017
4 min read
Save for later

How Blockchain can level up IoT Security

Savia Lobo
29 Aug 2017
4 min read
IoT contains hoard of sensors, vehicles and all devices that have embedded electronics which can communicate over the Internet. These IoT enabled devices generate tons of data every second. And with IoT Edge Analytics, these devices are getting much smarter - they can start or stop a request without any human intervention. 25 billion connected ”things” will be connected to the internet by 2020. - Gartner Research With so much data being generated by these devices, the question on everyone’s mind is: Will all this data be reliable and secure? When Brains meet Brawn: Blockchain for IoT Blockchain, an open distributed ledger, is highly secure and difficult to manipulate/corrupt by anyone connected over the network. It was initially designed for cryptocurrency based financial transactions. Bitcoin is a famous example which has Blockchain as its underlying technology. Blockchain has come a long way since then and can now be used to store anything of value. So why not save data in it? And this data will be secure just like every digital asset in a Blockchain is. Blockchain, decentralized and secured, is an ideal structure suited to form the underlying foundation for IoT data solutions. Current IoT devices and their data rely on the client service architecture. All devices are identified, authenticated, and connected via the cloud servers, which are capable of storing ample amount of data. But this requires huge infrastructure, which is all the more expensive. Blockchain not only provides an economical alternative but also since it works in a decentralized fashion it eliminates all single point of failures, creating a much secure and tougher network for IoT devices. This makes IoT more secure and reliable. Customers can therefore relax knowing their information is in safe hands. Today, Blockchain’s capabilities extend beyond processing financial transactions - It can now track billions of connected devices, process transactions and even co-ordinate between devices - a good fit for the IoT industry. Why Blockchain is perfect for IoT Inherent weak security features make IoT devices suspect. On the other hand, Blockchain with its tamper-proof ledger makes it hard to manipulate for malicious activities - thus, making it the right infrastructure for IoT solutions. Enhancing security through decentralization Blockchain makes it hard for intruders to intervene as it spans across a network of secure blocks. Change at a single location, therefore, does not affect the other blocks. The data or any value remains encrypted and is only visible to the person who has encrypted it using a private key. The cryptographic algorithms used in Blockchain technology ensure the IoT data remain private either for an individual organization or for the organizations connected in a network . Simplicity through autonomous 3rd-party-free transactions Blockchain technology is already a star in the finance sector thanks to the adoption of smart contracts, Bitcoin and other cryptocurrencies. Apart from providing a secured medium for financial transactions, it eliminates the need for third-party brokers such as banks to provide guarantee over peer-to-peer payment services. With Blockchain, IoT data can be treated in a similar manner, wherein smart contracts can be made between devices to exchange messages and data. This type of autonomy is possible because each node in the blockchain network can verify the validity of the transaction without relying on a centralized authority. Blockchain backed IoT solutions will thus enable trustworthy message sharing. Business partners can easily access and exchange confidential information within the IoT without a centralized management/regulatory authority. This means quicker transactions, lower costs and lesser opportunities for malicious intent such as data espionage. Blockchain's immutability for predicting IoT security vulnerabilities Blockchains maintain a history of all transactions made by smart devices connected within a particular network. This is possible because once you enter data in a Blockchain, it lives there forever in its immutable ledger. The possibilities for IoT solutions that leverage Blockchain’s immutability are limitless. Some obvious uses cases are more robust credit-scores and preventive health-care solutions that use data accumulated through wearables. For all the above reasons, we see significant Blockchain adoption by IoT based businesses in the near future.
Read more
  • 0
  • 0
  • 25683

article-image-how-do-data-structures-and-data-models-differ
Amey Varangaonkar
21 Dec 2017
7 min read
Save for later

How do Data Structures and Data Models differ?

Amey Varangaonkar
21 Dec 2017
7 min read
[box type="note" align="" class="" width=""]The following article is an excerpt taken from the book Statistics for Data Science, authored by James D. Miller. The book presents interesting techniques through which you can leverage the power of statistics for data manipulation and analysis.[/box] In this article, we will be zooming the spotlight on data structures and data models, and also understanding the difference between both. Data structures Data developers will agree that whenever one is working with large amounts of data, the organization of that data is imperative. If that data is not organized effectively, it will be very difficult to perform any task on that data, or at least be able to perform the task in an efficient manner. If the data is organized effectively, then practically any operation can be performed easily on that data. A data or database developer will then organize the data into what is known as data structures. Following image is a simple binary tree, where the data is organized efficiently by structuring it: A data structure can be defined as a method of organizing large amounts of data more efficiently so that any operation on that data becomes easy. Data structures are created in such a way as to implement one or more particular abstract data type (ADT), which in turn will stipulate what operations can be performed on the data structure, as well as the computational complexity of those operations. [box type="info" align="" class="" width=""]In the field of statistics, an ADT is a model for data types where a data type is defined by its behavior from the point of view (POV) of users of that data, explicitly showing the possible values, the possible operations on data of this type, and the behavior of all of these operations.[/box] Database design is then the process of using the defined data structures to produce a detailed data model, which will become the database. This data model must contain all of the required logical and physical design selections, as well as the physical storage parameters needed to produce a design in a Data Definition Language (DDL), which can then be used to create an actual database. [box type="info" align="" class="" width=""]There are varying degrees of the data model, for example, a fully attributed data model would also contain detailed attributes for each entity in the model.[/box] So, is a data structure a data model? No, a data structure is used to create a data model. Is this data model the same as data models used in statistics? Let's see in the next section. Data models You will find that statistical data models are at the heart of statistical analytics. In the simplest terms, a statistical data model is defined as the following: A representation of a state, process, or system that we want to understand and reason about In the scope of the previous definition, the data or database developer might agree that in theory or in concept, one could use the same terms to define a financial reporting database, as it is designed to contain business transactions and is arranged in data structures that allow business analysts to efficiently review the data, so that they can understand or reason about particular interests they may have concerning the business. Data scientists develop statistical data models so that they can draw inferences from them and, more importantly, make predictions about a topic of concern. Data developers develop databases so that they can similarly draw inferences from them and, more importantly, make predictions about a topic of concern (although perhaps in some organizations, databases are more focused on past and current events (transactions) than on forward-thinking ones (predictions)). Statistical data models come in a multitude of different formats and flavours (as do databases). These models can be equations linking quantities that we can observe or measure; they can also be simply, sets of rules. Databases can be designed or formatted to simplify the entering of online transactions—say, in an order entry system—or for financial reporting when the accounting department must generate a balance sheet, income statement, or profit and loss statement for shareholders. [box type="info" align="" class="" width=""]I found this example of a simple statistical data model: Newton's Second Law of Motion, which states that the net sum of force acting on an object causes the object to accelerate in the direction of the force applied, and at a rate proportional to the resulting magnitude of the force and inversely proportional to the object's mass.[/box] What's the difference? Where or how does the reader find the difference between a data structure or database and a statistical model? At a high level, as we speculated in previous sections, one can conclude that a data structure/database is practically the same thing as a statistical data model, as shown in the following image: At a high level, as we speculated in previous sections, one can conclude that a data structure/database is practically the same thing as a statistical data model. When we take the time to drill deeper into the topic, you should consider the following key points: Although both the data structure/database and the statistical model could be said to represent a set of assumptions, the statistical model typically will be found to be much more keenly focused on a particular set of assumptions concerning the generation of some sample data, and similar data from a larger population, while the data structure/database more often than not will be more broadly based A statistical model is often in a rather idealized form, while the data structure/database may be less perfect in the pursuit of a specific assumption Both a data structure/database and a statistical model are built around relationships between variables The data structure/database relationship may focus on answering certain questions, such as: What are the total orders for specific customers? What are the total orders for a specific customer who has purchased from a certain salesperson? Which customer has placed the most orders? Statistical model relationships are usually very simple, and focused on proving certain questions: Females are shorter than males by a fixed amount Body mass is proportional to height The probability that any given person will partake in a certain sport is a function of age, sex, and socioeconomic status Data structures/databases are all about the act of summarizing data based on relationships between variables Relationships The relationships between variables in a statistical model may be found to be much more complicated than simply straightforward to recognize and understand. An illustration of this is awareness of effect statistics. An effect statistic is one that shows or displays a difference in value to one that is associated with a difference related to one or more other variables. Can you image the SQL query statements you'd use to establish a relationship between two database variables based upon one or more effect statistic? On this point, you may find that a data structure/database usually aims to characterize relationships between variables, while with statistical models, the data scientist looks to fit the model to prove a point or make a statement about the population in the model. That is, a data scientist endeavors to make a statement about the accuracy of an estimate of the effect statistic(s) describing the model! One more note of interest is that both a data structure/database and a statistical model can be seen as tools or vehicles that aim to generalize a population; a database uses SQL to aggregate or summarize data, and a statistical model summarizes its data using effect statistics. The above argument presented the notion that data structures/databases and statistical data models are, in many ways, very similar. If you found this excerpt to be useful, check out the book Statistics for Data Science, which demonstrates different statistical techniques for implementing various data science tasks such as pre-processing, mining, and analysis.  
Read more
  • 0
  • 0
  • 25625

article-image-how-to-secure-your-crypto-currency
Guest Contributor
08 Sep 2018
8 min read
Save for later

How to secure your crypto currency

Guest Contributor
08 Sep 2018
8 min read
Managing and earning cryptocurrency is a lot of hassle and losing it is a lot like losing yourself. While security of this blockchain based currency is a major concern, here is what you can do to secure your crypto fortune. With the ever fluctuating crypto-rates, every time, it’s now or never. While Bitcoin climbed up to $17,900 in the past, the digital currency frenzy is always in-trend and its security is crucial. No crypto geek wants to lose their currency due to malicious activities, negligence or any other reason. Before we delve into securing our crypto currencies, lets discuss the structure and strategy of this crypto vault that ensures the absolute security of a blockchain based digital currency. Why blockchains are secure, at least, in theory Below are the three core elements that contribute in making blockchain a fool proof digital technology.        Public key cryptography        Hashing        Digital signatures Public Key Cryptography This cryptography involves two distinctive keys i.e., private and public keys. Both keys decrypt and encrypt data asymmetrically. Both have simultaneous dependency of data which is encrypted by a private key and can only be decrypted with the public key. Similarly, data decrypted by public key can only be decrypted by a private key. Various cryptography schemes including TLS (Transport Layer Security protocol) and SSL (Secure Sockets Layer) have this system at its core. The strategy works well with you putting in your public key into the world of blockchain and keeping your private key confidential, not revealing it on any platform or place. Hashing Also called a digest, the hash of a message gets calculated on the basis of the contents of a message. The hashing algorithm generates a hash that is created deterministically. Data of an arbitrary length acts an input to the hashing algorithm. The outcome of this complex process is known as a calculated amount of hash with a predefined length. Due to its deterministic nature, the input and output are the same. Considering mathematical calculations, it’s easy to convert a message into hash but when it comes to obtaining an original message from hash, it is tediously difficult. Digital Signatures A digital signature is an encrypted form of hash of a message and is an outcome of a private key. Anyone who has the access to the public key can break into the digital signature by decrypting it and this can be used to get the original hash. Anyone who can read the message can calculate the hash of a message on its own. The independently calculated hash can be compared with the decrypted hash to ensure both the hashes are the same. If they both match, it is a confirmation that the message remains unaltered from creation to reception. Additionally, it is a sign of a relating private key digitally signing the message. A hash is extracted from a message and if a message gets altered, it will produce a different type of hash. Note that it is complex to reverse the process to find the message of a hash but it’s easy to compute the hash of a message. A hash that is encrypted by a private key is known as digital signature. Anyone having a public key can decrypt a digital signature and they have the ability to compare the digital signature with a calculated hash of the message. If the value of an original message is active and the message is signed by the entity having the private key, it means that the hashes are identical. What are Crypto wallets and transactions Every crypto-wallet is a combined collection of single or more wallets. A crypto-wallet is a private key and it can create a public key too. By using a public key, a public wallet address can be easily created. This makes a cryptocurrency wallet a set of private keys. To enable sharing wallet address with the public, they are converted into QR codes eliminated the need to maintain secrecy. One can always show QR codes to the world without any hesitation and anyone can send cryptocurrency using that wallet address. However, a cryptocurrency transaction needs a private key and currency sent into a wallet is owned by the owner of the wallet. In order to transact using cryptocurrency, a transaction is created that is public information. A transaction of crypto currency is a collection of information a blockchain needs. The only needed data for a transaction is the destination wallet’s address and the desired amount to be transferred. While anyone can transact in cryptocurrency, the transactions are only permitted by the blockchain if it is assured by multiple members in the network. A transaction should be digitally signed by a private key in order to get a valid status or else, it would be treated as invalid. In other words, one signs a transaction with the private key and then it gets to the blockchain. Once the blockchain accepts the key by confirming the public key data, it gets included in the blockchain that validates the transaction. Why you should guard your private key An attack on your private key is an attempt to steal your cryptocurrency. By using your private keys, an attacker attempts to digitally sign transactions from your wallet address to their address. Moreover, an attacker can destroy your private keys thus ending your access to your crypto wallet. What are some risk factors involved in owning a crypto wallet Before we move on to creating a security wall around our crypto currency, it is important to know from whom we are protecting our digital currency or who can prove to be a threat for our crypto wallets. If you lose the access to your crypto currency, you have lost it all as there isn’t any ledger with a centralized authority and once you lose the access, you can't regain it by any means. Since a crypto wallet is paired by a private and public key, losing the private key means losing your wallet. In other words, you don’t own any cryptocurrency. This is the very first and foremost threat. The next in line threat is what we hear often. Attackers, hackers or attempters who want to gain access to our cryptocurrency. The malfunctions may be opportunist or they may have their private intentions. Threats for your cryptocurrency Opportunist hackers are low profile attackers who get access to your laptop for transacting money to their public wallet address. Opportunist hackers doesn’t attack or target a person specifically, but if they get access to your crypto currency, they won’t shy away from taking your digital cash. Dedicated attackers, on the other hand, target single handedly or they may be in a group of hackers who work together for a sole purpose that is – stealing cryptocurrency. Their targets include every individual, crypto trader or even a crypto exchange. They initiate phishing campaigns and before executing the attack, they get well-versed with their target by conducting a pre-research. Level 2 attackers go for a broader approach and write malicious code that may steal private keys from a system if it gets attacked or infected. Another kind of hackers are backed by nation states. They are a collective group of people with top level coordination and established financials. They are motivated by gaining access to finances or their will. The crypto currency attacks by Lazarus Group, backed by the North Korea, are an example. How to Protect Your crypto wallet Regardless of the kind of threat, it is you and your private key that needs to be secured. Here’s how to ensure maximum security of your cryptocurrency. Throw away your access keys and you will lose your cryptocurrency forever. Obviously, you won’t do it ever and since the aforementioned thought came into your mind after reading the phrase, here are some other ways to secure your cryptocurrency fortune.       Go through the complete password recovery process. This means going through the process of forgetting the password and creating a multi-factor token. These measures should be taken while setting up a new hosted wallet or else, be prepared to lose it all.       No matter how fast the tech world progresses, basics will remain the same. You should have a printed paper backup of your keys and they should be placed in a secure location such as a bank’s locker or in a personal safe vault. Don’t forget to wipe out the printer’s memory after you are done with printing as printed files can be restored and re used to hack your digital money.       Do not keeps those keys with you nor should you be hiding those keys in a closet that can get damaged due to fire, theft, etc.       If your wallet has multi-signature enabled on it and has two public or private keys for the authorization of transactions, make it to three keys. While the third key will be controlled by an entrusted party, it will help you in the absence of a second person. About Author Tahha Ashraf is a Digital Content Producer at Cubix, a mobile app development company. He is a Certified Hubspot inbound and content marketer. He loves talking about brands, tech, blockchain and content marketing. Along with writing for the online fraternity on a variety of topics, he is fond of creativity and writes poetry in his free time. Cryptocurrency-based firm, Tron acquires BitTorrent Can Cryptocurrency establish a new economic world order? Akon is planning to create a cryptocurrency city in Senegal    
Read more
  • 0
  • 0
  • 25311

article-image-budget-demand-forecasting-markov-model-in-sas
Sunith Shetty
10 Aug 2018
8 min read
Save for later

Budget and Demand Forecasting using Markov model in SAS [Tutorial]

Sunith Shetty
10 Aug 2018
8 min read
Budget and demand forecasting are important aspects of any finance team. Budget forecasting is the outcome, and demand forecasting is one of its components. In this article, we understand the Markov model for forecasting and budgeting in finance.   This article is an excerpt from a book written by Harish Gulati titled SAS for Finance. Understanding problem of budget and demand forecasting While a few decades ago, retail banks primarily made profits by leveraging their treasury office, recent years have seen fee income become a major source of profitability. Accepting deposits from customers and lending to other customers is one of the core functions of the treasury. However, charging for current or savings accounts with add-on facilities such as breakdown cover, mobile, and other insurances, and so on, has become a lucrative avenue for banks. One retail bank has a plain vanilla classic bank account, mid-tier premier, and a top-of-the-range, benefits included a platinum account. The classic account is offered free and the premier and platinum have fees of $10 and $20 per month respectively. The marketing team has just relaunched the fee-based accounts with added benefits. The finance team wanted a projection of how much revenue could be generated via the premier and the platinum accounts. Solving with Markovian model approach Even though we have three types of account, the classic, premier, and the platinum, it doesn't mean that we are only going to have nine transition types possible as in Figure 4.1. There are customers who will upgrade, but also others who may downgrade. There could also be some customers who leave the bank and at the same time there will be a constant inflow of new customers. Let's evaluate the transition states flow for our business problem: In Figure 4.2, we haven't jotted down the transition probability between each state. We can try to do this by looking at the historical customer movements, to arrive at the transitional probability. Be aware that most business managers would prefer to use their instincts while assigning transitional probabilities. They may achieve some merit in this approach, as the managers may be able to incorporate the various factors that may have influenced the customer movements between states. A promotion offering 40% off the platinum account (effective rate $12/month, down from $20/month) may have ensured that more customers in the promotion period opted for the platinum account than the premier offering ($10/month). Let's examine the historical data of customer account preferences. The data is compiled for the years 2008 – 2018. This doesn't account for any new customers joining after January 1, 2008 and also ignores information on churned customers in the period of interest. Figure 4.3 consists of customers who have been with the bank since 2008: Active customer counts (Millions) Year Classic (Cl) Premium (Pr) Platinum (Pl) Total customers 2008 H1 30.68 5.73 1.51 37.92 2008 H2 30.65 5.74 1.53 37.92 2009 H1 30.83 5.43 1.66 37.92 2009 H2 30.9 5.3 1.72 37.92 2010 H1 31.1 4.7 2.12 37.92 2010 H2 31.05 4.73 2.14 37.92 2011 H1 31.01 4.81 2.1 37.92 2011 H2 30.7 5.01 2.21 37.92 2012 H1 30.3 5.3 2.32 37.92 2012 H2 29.3 6.4 2.22 37.92 2013 H1 29.3 6.5 2.12 37.92 2013 H2 28.8 7.3 1.82 37.92 2014 H1 28.8 8.1 1.02 37.92 2014 H2 28.7 8.3 0.92 37.92 2015 H1 28.6 8.34 0.98 37.92 2015 H2 28.4 8.37 1.15 37.92 2016 H1 27.6 9.01 1.31 37.92 2016 H2 26.5 9.5 1.92 37.92 2017 H1 26 9.8 2.12 37.92 2017 H2 25.3 10.3 2.32 37.92 Figure 4.3: Active customers since 2008 Since we are only considering active customers, and no new customers are joining or leaving the bank, we can calculate the number of customers moving from one state to another using the data in Figure 4.3: Customer movement count to next year (Millions) Year Cl-Cl Cl-Pr Cl-Pl Pr-Pr Pr-Cl Pr-Pl Pl-Pl Pl-Cl Pl-Pr Total customers 2008 H1 - - - - - - - - - - 2008 H2 30.28 0.2 0.2 5.5 0 0.23 1.1 0.37 0.04 37.92 2009 H1 30.3 0.1 0.25 5.1 0.53 0.11 1.3 0 0.23 37.92 2009 H2 30.5 0.32 0.01 4.8 0.2 0.43 1.28 0.2 0.18 37.92 2010 H1 30.7 0.2 0 4.3 0 1 1.12 0.4 0.2 37.92 2010 H2 30.7 0.2 0.2 4.11 0.35 0.24 1.7 0 0.42 37.92 2011 H1 30.9 0 0.15 4.6 0 0.13 1.82 0.11 0.21 37.92 2011 H2 30.2 0.8 0.01 3.8 0.1 0.91 1.29 0.4 0.41 37.92 2012 H1 30.29 0.4 0.01 4.9 0.01 0.1 2.21 0 0 37.92 2012 H2 29.3 0.9 0.1 5.3 0 0 2.12 0 0.2 37.92 2013 H1 29.2 0.1 0 6.1 0.1 0.2 1.92 0 0.3 37.92 2013 H2 28.6 0.3 0.4 6.5 0 0 1.42 0.2 0.5 37.92 2014 H1 28.7 0.1 0 7.2 0.1 0 1.02 0 0.8 37.92 2014 H2 28.7 0 0.1 8.1 0 0 0.82 0 0.2 37.92 2015 H1 28.6 0 0.1 8.3 0 0 0.88 0 0.04 37.92 2015 H2 28.3 0 0.3 8 0.1 0.24 0.61 0 0.37 37.92 2016 H1 27.6 0.8 0 8.21 0 0.16 1.15 0 0 37.92 2016 H2 26 1 0.6 8.21 0.5 0.3 1.02 0 0.29 37.92 2017 H1 25 0.5 1 8 0.5 1 0.12 0.5 1.3 37.92 2017 H2 25.3 0.1 0.6 9 0 0.8 0.92 0 1.2 37.92 Figure 4.4: Customer transition state counts In Figure 4.4, we can see the customer movements between various states. We don't have the movements for the first half of 2008 as this is the start of the series. In the second half of 2008, we see that 30.28 out of 30.68 million customers (30.68 is the figure from the first half of 2008) were still using a classic account. However, 0.4 million customers moved away to premium and platinum accounts. The total customers remain constant at 37.92 million as we have ignored new customers joining and any customers who have left the bank. From this table, we can calculate the transition probabilities for each state: Year Cl-Cl Cl-Pr Cl-Pl Pr-Pr Pr-Cl Pr-Pl Pl-Pl Pl-Cl Pl-Pr 2008 H2 98.7% 0.7% 0.7% 96.0% 0.0% 4.0% 72.8% 24.5% 2.6% 2009 H1 98.9% 0.3% 0.8% 88.9% 9.2% 1.9% 85.0% 0.0% 15.0% 2009 H2 98.9% 1.0% 0.0% 88.4% 3.7% 7.9% 77.1% 12.0% 10.8% 2010 H1 99.4% 0.6% 0.0% 81.1% 0.0% 18.9% 65.1% 23.3% 11.6% 2010 H2 98.7% 0.6% 0.6% 87.4% 7.4% 5.1% 80.2% 0.0% 19.8% 2011 H1 99.5% 0.0% 0.5% 97.3% 0.0% 2.7% 85.0% 5.1% 9.8% 2011 H2 97.4% 2.6% 0.0% 79.0% 2.1% 18.9% 61.4% 19.0% 19.5% 2012 H1 98.7% 1.3% 0.0% 97.8% 0.2% 2.0% 100.0% 0.0% 0.0% 2012 H2 96.7% 3.0% 0.3% 100.0% 0.0% 0.0% 91.4% 0.0% 8.6% 2013 H1 99.7% 0.3% 0.0% 95.3% 1.6% 3.1% 86.5% 0.0% 13.5% 2013 H2 97.6% 1.0% 1.4% 100.0% 0.0% 0.0% 67.0% 9.4% 23.6% 2014 H1 99.7% 0.3% 0.0% 98.6% 1.4% 0.0% 56.0% 0.0% 44.0% 2014 H2 99.7% 0.0% 0.3% 100.0% 0.0% 0.0% 80.4% 0.0% 19.6% 2015 H1 99.7% 0.0% 0.3% 100.0% 0.0% 0.0% 95.7% 0.0% 4.3% 2015 H2 99.0% 0.0% 1.0% 95.9% 1.2% 2.9% 62.2% 0.0% 37.8% 2016 H1 97.2% 2.8% 0.0% 98.1% 0.0% 1.9% 100.0% 0.0% 0.0% 2016 H2 94.2% 3.6% 2.2% 91.1% 5.5% 3.3% 77.9% 0.0% 22.1% 2017 H1 94.3% 1.9% 3.8% 84.2% 5.3% 10.5% 6.2% 26.0% 67.7% 2017 H2 97.3% 0.4% 2.3% 91.8% 0.0% 8.2% 43.4% 0.0% 56.6% Figure 4.5: Transition state probability In Figure 4.5, we have converted the transition counts into probabilities. If 30.28 million customers in 2008 H2 out of 30.68 million customers in 2008 H1 are retained as classic customers, we can say that the retention rate is 98.7%, or the probability of customers staying with the same account type in this instance is .987. Using these details, we can compute the average transition between states across the time series. These averages can be used as the transition probabilities that will be used in the transition matrix for the model: Cl Pr Pl Cl 98.2% 1.1% 0.8% Pr 2.0% 93.2% 4.8% Pl 6.3% 20.4% 73.3% Figure 4.6: Transition probabilities aggregated The probability of classic customers retaining the same account type between semiannual time periods is 98.2%. The lowest retain probability is for platinum customers as they are expected to transition to another customer account type 26.7% of the time. Let's use the transition matrix in Figure 4.6 to run our Markov model. Use this code for Data setup: DATA Current; input date CL PR PL; datalines; 2017.2 25.3 10.3 2.32 ; Run; Data Netflow; input date CL PR PL; datalines; 2018.1 0.21 0.1 0.05 2018.2 0.22 0.16 0.06 2019.1 .24 0.18 0.08 2019.2 0.28 0.21 0.1 2020.1 0.31 0.23 0.14 ; Run; Data TransitionMatrix; input CL PR PL; datalines; 0.98 0.01 0.01 0.02 0.93 0.05 0.06 0.21 0.73 ; Run; In the current data set, we have chosen the last available data point, 2017 H2. This is the base position of customer counts across classic, premium, and platinum accounts. While calculating the transition matrix, we haven't taken into account new joiners or leavers. However, to enable forecasting we have taken 2017 H2 as our base position. The transition matrix seen in Figure 4.6 has been input as a separate dataset. Markov model code PROC IML; use Current; read all into Current; use Netflow; read all into Netflow; use TransitionMatrix; read all into TransitionMatrix; Current = Current [1,2:4]; Netflow = Netflow [,2:4]; Model_2018_1 = Current * TransitionMatrix + Netflow [1,]; Model_2018_2 = Model_2018_1 * TransitionMatrix + Netflow [1,]; Model_2019_1 = Model_2018_2 * TransitionMatrix + Netflow [1,]; Model_2019_2 = Model_2019_1 * TransitionMatrix + Netflow [1,]; Model_2020_1 = Model_2019_2 * TransitionMatrix + Netflow [1,]; Budgetinputs = Model_2018_1//Model_2018_2//Model_2019_1//Model_2019_2//Model_2020_1; Create Budgetinputs from Budgetinputs; append from Budgetinputs; Quit; Data Output; Set Budgetinputs (rename=(Col1=Cl Col2=Pr Col3=Pl)); Run; Proc print data=output; Run; Figure 4.7: Model output The Markov model has been run and we are able to generate forecasts for all account types for the requested five periods. We can immediately see that there is an increase forecasted for all the account types. This is being driven by the net flow of customers. We have derived the forecasts by essentially using the following equation: Forecast = Current Period * Transition Matrix + Net Flow Once the 2018 H1 forecast is derived, we replace the Current Period with the 2018 H1 forecasted number while trying to forecast the 2018 H2 numbers. We are doing this as, based on the 2018 H1 customer counts, the transition probabilities will determine how many customers move across states. This will help generate the forecasted customer count for the required period. Understanding transition probability Now, since we have our forecasts let's take a step back and revisit our business goals. The finance team wants to estimate the revenues from the revamped premium and platinum customer accounts for the next few forecasting periods. As we have seen, one of the important drivers of the forecasting process is the transition probability. This transition probability is driven by historical customer movements, as shown in Figure 4.4. What if the marketing team doesn't agree with the transitional probabilities calculated in Figure 4.6? As we discussed, 26.7% of platinum customers aren't retained in this account type. Since we are not considering customer churn out of the bank, this means that a large proportion of platinum customers downgrade their accounts. One of the reasons the marketing teams revamped the accounts is due to this reason. The marketing team feels that it will be able to raise the retention rates for platinum customers and want the finance team to run an alternate forecasting scenario. This is, in fact, one of the pros of the Markov model approach as by tweaking the transition probabilities we can run various business scenarios. Let's compare the base and the alternate scenario forecasts generated in Figure 4.8: A change in the transition probabilities of how platinum customers moved to various states has brought about a significant change in the forecast for premium and platinum customer accounts. For classic customers, the change in the forecast between the base and the alternate scenario is negligible, as shown in the table in Figure 4.8. The finance team can decide which scenario is best suited for budget forecasting: Cl Pr Pl Cl 98.2% 1.1% 0.8% Pr 2.0% 93.2% 4.8% Pl 5.0% 15.0% 80.0% Figure 4.8: Model forecasts and updated transition probabilities To summarize, we learned the Markov model methodology and learned Markov models for forecasting and imputation. To know more about how to use the other two methodologies such as ARIMA and MCMC for generating forecasts for various business problems, you can check out the book SAS for Finance. Read more How to perform regression analysis using SAS Performing descriptive analysis with SAS Akon is planning to create a cryptocurrency city in Senegal
Read more
  • 0
  • 1
  • 24471

article-image-learn-scikit-learn
Guest Contributor
23 Nov 2017
8 min read
Save for later

Why you should learn Scikit-learn

Guest Contributor
23 Nov 2017
8 min read
Today, machine learning in Python has become almost synonymous with scikit-learn. The "Big Bang" moment for scikit-learn was in 2007 when a gentleman named David Cournapeau decided to write this project as part of Google Summer of Code 2007. Let's take a moment to thank him. Matthieu Brucher later came on board and developed it further as part of his thesis. From that point on, sklearn never looked back. In 2010, the prestigious French research organization INRIA took ownership of the project with great developers like Gael Varoquaux, Alexandre Gramfort et al. starting work on it. Here's the oldest pull request I could find in sklearn’s repository. The title says "we're getting there"! Starting from there to today where sklearn receives funding and support from Google, Telecom ParisTech and Columbia University among others, it surely must’ve been quite a journey. Sklearn is an open source library which uses the BSD license. It is widely used in industry as well as in academia. It is built on Numpy, Scipy and Matplotlib while also having wrappers around various popular libraries such LIBSVM. Sklearn can be used “out of the box” after installation. Can I trust scikit-learn? Scikit-learn, or sklearn, is a very active open source project having brilliant maintainers. It is used worldwide by top companies such as Spotify, booking.com and the like. That it is open source where anyone can contribute might make you question the integrity of the code, but from the little experience I have contributing to sklearn, let me tell you only very high-quality code gets merged. All pull requests have to be affirmed by at least two core maintainers of the project. Every code goes through multiple iterations. While this can be time-consuming for all the parties involved, such regulations ensure sklearn’s compliance with the industry standard at all times. You don’t just build a library that’s been awarded the “best open source library” overnight! How can I use scikit-learn? Sklearn can be used for a wide variety of use-cases ranging from image classification to music recommendation to classical data modeling. Scikit-learn in various industries: In the Image classification domain, Sklearn’s implementation of K-Means along with PCA has been used for handwritten digit classification very successfully in the past. Sklearn has also been used for facial/ faces recognition using SVM with PCA. Image segmentation tasks such as detecting Red Blood Corpuscles or segmenting the popular Lena image into sections can be done using sklearn. A lot of us here use Spotify or Netflix and are awestruck by their recommendations. Recommendation engines started off with the collaborative filtering algorithm. It basically says “if people like me like something, I’ll also most probably like that.” To find out users with similar tastes, a KNN algorithm can be used which is available in sklearn. You can find a good demonstration of how it is used for music recommendation here. Classical data modeling can be bolstered using sklearn. Most people generally start their kaggle competitive journeys with the titanic challenge. One of the better tutorials out there on starting out is by dataquest and generally acts as a good introduction on how to use pandas and sklearn (a lethal combination!) for data science. It uses the robust Logistic Regression, Random Forest and the Ensembling modules to guide the user. You will be able to experience the user-friendliness of sklearn first hand while completing this tutorial. Sklearn has made machine learning literally a matter of importing a package. Sklearn also helps in Anomaly detection for highly imbalanced datasets (99.9% to 0.1% in credit card fraud detection) through a host of tools like EllipticEnvelope and OneClassSVM. In this regard, the recently merged IsolationForest algorithm especially works well in higher dimensional sets and has very high performance. Other than that, sklearn has implementations of some widely used algorithms such as linear regression, decision trees, SVM and Multi Layer Perceptrons (Neural Networks) to name a few. It has around 39 models in the “linear models” module itself! Happy scrolling here! Most of these algorithms can run very fast compared to raw python code since they are implemented in Cython and use Numpy and Scipy (which in-turn use C) for low-level computations. How is sklearn different from TensorFlow/MLllib? TensorFlow is a popular library to implement deep learning algorithms (since it can utilize GPUs). But while it can also be used to implement machine learning algorithms, the process can be arduous. For implementing logistic regression in TensorFlow, you will first have to “build” the logistic regression algorithm using a computational graph approach. Scikit-learn, on the other hand, provides the same algorithm out of the box however with the limitation that it has to be done in memory. Here's a good example of how LogisticRegression is done in Tensorflow. Apache Spark’s MLlib, on the other hand, consists of algorithms which can be used out of the box just like in Sklearn, however, it is generally used when the ML task is to be performed in a distributed setting. If your dataset fits into RAM, Sklearn would be a better choice for the task. If the dataset is massive, most people generally prototype on a small subset of the dataset locally using Sklearn. Once prototyping and experimentation are done, they deploy in the cluster using MLlib. Some sklearn must-knows Scikit-learn can be used for three different kinds of problems in machine learning namely supervised learning, unsupervised learning and reinforcement learning (ahem AlphaGo). Unsupervised learning happens when one doesn’t have ‘y’ labels in their dataset. Dimensionality reduction and clustering are typical examples. Scikit-learn has implementations of variations of the Principal Component Analysis such as SparsePCA, KernelPCA, and IncrementalPCA among others. Supervised learning covers problems such as spam detection, rent prediction etc. In these problems, the ‘y’ tag for the dataset is present. Models such as Linear regression, random forest, adaboost etc. are implemented in sklearn. From sklearn.linear_models import LogisticRegression Clf = LogisticRegression().fit(train_X, train_y) Preds = Clf.predict(test_X) Model evaluation and analysis Cross-validation, grid search for parameter selection and prediction evaluation can be done using the Model Selection and Metrics module which implements functions such as cross_val_score and f1_score respectively among others. They can be used as such: Import numpy as np From model_selection import cross_val_score From sklearn.metrics import f1_score Cross_val_avg = np.mean(cross_val_score(clf, train_X, train_y, scoring=’f1’)) # tune your parameters for better cross_val_score # for model results on a certain classification problem F_measure = f1_score(test_y, preds) Model Saving Simply pickle your model using pickle.save and it is ready to be distributed and deployed! Hence a whole machine learning pipeline can be built easily using sklearn. Finishing Remarks There are many good books out there talking about machine learning, but in context to Python,  Sebastian Raschka`s  (one of the core developers on sklearn) recently released his book titled “ Python Machine Learning” and it’s in great demand. Another great blog you could follow is Erik Bernhardsson’s blog. Along with writing about machine learning, he also discusses software development and other interesting ideas. Do subscribe to the scikit-learn mailing list as well. There are some very interesting questions posted there and a lot of learnings to take home. The machine learning subreddit also collates information from a lot of different sources and is thus a good place to find useful information. Scikit-learn has revolutionized the machine learning world by making it accessible to everyone. Machine learning is not like black magic anymore. If you use scikit-learn and like it, do consider contributing to sklearn. There is a huge clutter of open issues and PRs on the sklearn GitHub page. Scikit-learn needs contributors! Have a look at this page to start contributing. Contributing to a library is easily the best way to learn it! [author title="About the Author"]Devashish Deshpande started his foray into data science and machine learning in 2015 with an online course when the question of how machines can learn started intriguing him. He pursued more online courses as well as courses in data science during his undergrad. In order to gain practical knowledge he started contributing to open source projects beginning with a small pull request in Scikit-Learn. He then did a summer project with Gensim and delivered workshops and talks at PyCon France and India in 2016. Currently, Devashish works in the data science team at belong.co, India. Here's the link to his GitHub profile.[/author]
Read more
  • 0
  • 0
  • 21898
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-understanding-sentiment-analysis-and-other-key-nlp-concepts
Sunith Shetty
20 Dec 2017
12 min read
Save for later

Understanding Sentiment Analysis and other key NLP concepts

Sunith Shetty
20 Dec 2017
12 min read
[box type="note" align="" class="" width=""]This article is an excerpt taken from a book Big Data Analytics with Java written by Rajat Mehta. This book will help you learn to perform big data analytics tasks using machine learning concepts such as clustering, recommending products, data segmentation and more. [/box] With this post, you will learn what is sentiment analysis and how it is used to analyze emotions associated within the text. You will also learn key NLP concepts such as Tokenization, stemming among others and how they are used for sentiment analysis. What is sentiment analysis? One of the forms of text analysis is sentimental analysis. As the name suggests this technique is used to figure out the sentiment or emotion associated with the underlying text. So if you have a piece of text and you want to understand what kind of emotion it conveys, for example, anger, love, hate, positive, negative, and so on you can use the technique sentimental analysis. Sentimental analysis is used in various places, for example: To analyze the reviews of a product whether they are positive or negative This can be especially useful to predict how successful your new product is by analyzing user feedback To analyze the reviews of a movie to check if it's a hit or a flop Detecting the use of bad language (such as heated language, negative remarks, and so on) in forums, emails, and social media To analyze the content of tweets or information on other social media to check if a political party campaign was successful or not  Thus, sentimental analysis is a useful technique, but before we see the code for our sample sentimental analysis example, let's understand some of the concepts needed to solve this problem. [box type="shadow" align="" class="" width=""]For working on a sentimental analysis problem we will be using some techniques from natural language processing and we will be explaining some of those concepts.[/box] Concepts for sentimental analysis Before we dive into the fully-fledged problem of analyzing the sentiment behind text, we must understand some concepts from the NLP (Natural Language Processing) perspective. We will explain these concepts now. Tokenization From the perspective of machine learning one of the most important tasks is feature extraction and feature selection. When the data is plain text then we need some way to extract the information out of it. We use a technique called tokenization where the text content is pulled and tokens or words are extracted from it. The token can be a single word or a group of words too. There are various ways to extract the tokens, as follows: By using regular expressions: Regular expressions can be applied to textual content to extract words or tokens from it. By using a pre-trained model: Apache Spark ships with a pre-trained model (machine learning model) that is trained to pull tokens from a text. You can apply this model to a piece of text and it will return the predicted results as a set of tokens. To understand a tokenizer using an example, let's see a simple sentence as follows: Sentence: "The movie was awesome with nice songs" Once you extract tokens from it you will get an array of strings as follows: Tokens: ['The', 'movie', 'was', 'awesome', 'with', 'nice', 'songs'] [box type="shadow" align="" class="" width=""]The type of tokens you extract depends on the type of tokens you are interested in. Here we extracted single tokens, but tokens can also be a group of words, for example, 'very nice', 'not good', 'too bad', and so on.[/box] Stop words removal Not all the words present in the text are important. Some words are common words used in the English language that are important for the purpose of maintaining the grammar correctly, but from conveying the information perspective or emotion perspective they might not be important at all, for example, common words such as is, was, were, the, and so. To remove these words there are again some common techniques that you can use from natural language processing, such as: Store stop words in a file or dictionary and compare your extracted tokens with the words in this dictionary or file. If they match simply ignore them. Use a pre-trained machine learning model that has been taught to remove stop words. Apache Spark ships with one such model in the Spark feature package. Let's try to understand stop words removal using an example: Sentence: "The movie was awesome" From the sentence we can see that common words with no special meaning to convey are the and was. So after applying the stop words removal program to this data you will get: After stop words removal: [ 'movie', 'awesome', 'nice', 'songs'] [box type="shadow" align="" class="" width=""]In the preceding sentence, the stop words the, was, and with are removed.[/box] Stemming Stemming is the process of reducing a word to its base or root form. For example, look at the set of words shown here: car, cars, car's, cars' From our perspective of sentimental analysis, we are only interested in the main words or the main word that it refers to. The reason for this is that the underlying meaning of the word in any case is the same. So whether we pick car's or cars we are referring to a car only. Hence the stem or root word for the previous set of words will be: car, cars, car's, cars' => car (stem or root word) For English words again you can again use a pre-trained model and apply it to a set of data for figuring out the stem word. Of course there are more complex and better ways (for example, you can retrain the model with more data), or you have to totally use a different model or technique if you are dealing with languages other than English. Diving into stemming in detail is beyond the scope of this book and we would encourage readers to check out some documentation on natural language processing from Wikipedia and the Stanford nlp website. [box type="shadow" align="" class="" width=""]To keep the sentimental analysis example in this book simple we will not be doing stemming of our tokens, but we will urge the readers to try the same to get better predictive results.[/box] N-grams Sometimes a single word conveys the meaning of context, other times a group of words can convey a better meaning. For example, 'happy' is a word in itself that conveys happiness, but 'not happy' changes the picture completely and 'not happy' is the exact opposite of 'happy'. If we are extracting only single words then in the example shown before, that is 'not happy', then 'not' and 'happy' would be two separate words and the entire sentence might be selected as positive by the classifier However, if the classifier picks the bi-grams (that is, two words in one token) in this case then it would be trained with 'not happy' and it would classify similar sentences with 'not happy' in it as 'negative'. Therefore, for training our models we can either use a uni-gram or a bi-gram where we have two words per token or as the name suggest an n-gram where we have 'n' words per token, it all depends upon which token set trains our model well and it improves its predictive results accuracy. To see examples of n-grams refer to the following table:   Sentence The movie was awesome with nice songs Uni-gram ['The', 'movie', 'was', 'awesome', 'with', 'nice', 'songs'] Bi-grams ['The movie', 'was awesome', 'with nice', 'songs'] Tri-grams ['The movie was', 'awesome with nice', 'songs']   For the purpose of this case study we will be only looking at unigrams to keep our example simple. By now we know how to extract words from text and remove the unwanted words, but how do we measure the importance of words or the sentiment that originates from them? For this there are a few popular approaches and we will now discuss two such approaches. Term presence and term frequency Term presence just means that if the term is present we mark the value as 1 or else 0. Later we build a matrix out of it where the rows represent the words and columns represent each sentence. This matrix is later used to do text analysis by feeding its content to a classifier. Term Frequency, as the name suggests, just depicts the count or occurrences of the word or tokens within the document. Let's refer to the example in the following table where we find term frequency:   Sentence The movie was awesome with nice songs and nice dialogues. Tokens (Unigrams only for now) ['The', 'movie', 'was', 'awesome', 'with', 'nice', 'songs', 'and', 'dialogues'] Term Frequency ['The = 1', 'movie = 1', 'was = 1', 'awesome = 1', 'with = 1', 'nice = 2', 'songs = 1', 'dialogues = 1']   As seen in the preceding table, the word 'nice' is repeated twice in the preceding sentence and hence it will get more weight in determining the opinion shown by the sentence. Bland term frequency is not a precise approach for the following reasons: There could be some redundant irrelevant words, for example, the, it, and they that might have a big frequency or count and they might impact the training of the model There could be some important rare words that could convey the sentiment regarding the document yet their frequency might be low and hence they might not be inclusive for greater impact on the training of the model Due to this reason, a better approach of TF-IDF is chosen as shown in the next sections. TF-IDF TF-IDF stands for Term Frequency and Inverse Document Frequency and in simple terms it means the importance of a term to a document. It works using two simple steps as follows: It counts the number of terms in the document, so the higher the number of terms the greater the importance of this term to the document. Counting just the frequency of words in a document is not a very precise way to find the importance of the words. The simple reason for this is there could be too many stop words and their count is high so their importance might get elevated above the importance of real good words. To fix this, TF-IDF checks for the availability of these stop words in other documents as well. If the words appear in other documents as well in large numbers that means these words could be grammatical words such as they, for, is, and so on, and TF-IDF decreases the importance or weight of such stop words. Let's try to understand TF-IDF using the following figure: As seen in the preceding figure, doc-1, doc-2, and so on are the documents from which we extract the tokens or words and then from those words we calculate the TF-IDFs. Words that are stop words or regular words such as for , is, and so on, have low TF-IDFs, while words that are rare such as 'awesome movie' have higher TF-IDFs. TF-IDF is the product of Term Frequency and Inverse document frequency. Both of them are explained here: Term Frequency: This is nothing but the count of the occurrences of the words in the document. There are other ways of measuring this, but the simplistic approach is to just count the occurrences of the tokens. The simple formula for its calculation is:      Term Frequency = Frequency count of the tokens Inverse Document Frequency: This is the measure of how much information the word provides. It scales up the weight of the words that are rare and scales down the weight of highly occurring words. The formula for inverse document frequency is: TF-IDF: TF-IDF is a simple multiplication of the Term Frequency and the Inverse Document Frequency. Hence: This simple technique is very popular and it is used in a lot of places for text analysis. Next let's look into another simple approach called bag of words that is used in text analytics too. Bag of words As the name suggests, bag of words uses a simple approach whereby we first extract the words or tokens from the text and then push them in a bag (imaginary set) and the main point about this is that the words are stored in the bag without any particular order. Thus the mere presence of a word in the bag is of main importance and the order of the occurrence of the word in the sentence as well as its grammatical context carries no value. Since the bag of words gives no importance to the order of words you can use the TF-IDFs of all the words in the bag and put them in a vector and later train a classifier (naïve bayes or any other model) with it. Once trained, the model can now be fed with vectors of new data to predict on its sentiment. Summing it up, we have got you well versed with sentiment analysis techniques and NLP concepts in order to apply sentimental analysis. If you want to implement machine learning algorithms to carry out predictive analytics and real-time streaming analytics you can refer to the book Big Data Analytics with Java.    
Read more
  • 0
  • 0
  • 21650

article-image-will-ethereum-eclipse-bitcoin
Ashwin Nair
24 Oct 2017
8 min read
Save for later

Will Ethereum eclipse Bitcoin?

Ashwin Nair
24 Oct 2017
8 min read
Unless you have been living under a rock, you have most likely heard about Bitcoin, the world's most popular cryptocurrency that is growing by leaps and bounds. In fact, recently, Bitcoin broke the threshold of $6000 and is now priced at an all-time high. Bitcoin is not alone in this race as another cryptocurrency named Ethereum is hot on its heels. Despite being only three years old, Ethereum is quickly emerging as a popular choice especially among enterprise users. Ethereum’s YTD price growth has been more than a whopping 3000%. In terms of market cap as well Ethereum has shown a significant increase. Its overall share of the 'total cryptocurrency market' rose from 5% at the beginning of the year to 30% YTD.  In absolute terms, today it stands at around $28 Billion.  On the other hand, Bitcoin’s market cap as a percentage of the market has shrunk from 85% at the start of the year to 55%  and is valued at around $90 Billion. Bitcoin played a huge role in bringing Ethereum into existence. The co-creator and inventor of Ethereum, Vitalik Buterin, was only 19 when his father introduced him to bitcoin and by extension, to the fascinating world of cryptocurrency. In a span of 3 years, Vitalik had written several blogs on the topic and also co-founded the Bitcoin Magazine in 2011. Though Bitcoin served as an excellent tool for money transaction eliminating the need for banks, fees, or third party, its scripting language had limitations. This led to Vitalik, along with other developers, to found Ethereum - A platform that aimed to extend beyond Bitcoin’s scope and make internet decentralized. How Ethereum differs from the reigning cryptocurrency - Bitcoin Both Bitcoin and Ethereum are built on top of Blockchain technology allowing them to build a decentralized public network. However, Ethereum’s capability extends beyond being a cryptocurrency and differs from Bitcoin substantially in terms of scope and potential. Exploiting the full spectrum blockchain platform Bitcoin leverages Blockchain's distributed ledger technology to perform secured peer-to-peer cash transactions. It thus disrupted traditional financial transaction instruments such as PayPal. Meanwhile, Ethereum aims to offer much more than digital currency by helping developers build and deploy any kind of decentralized applications on top of Blockchain. The following are some Ethereum based features and applications that make it superior to bitcoin. DApps A decentralized app or DApp refers to a program running on the internet through a network but is not under the control of any single entity. A white paper on DApp highlights the four conditions that need to be satisfied to call an application a DApp: It must be completely open-source Data and records of operation must be cryptographically stored It should utilize a cryptographic token It must generate tokens The whitepaper also goes on to suggest that DApps are the future: “decentralized applications will someday surpass the world’s largest software corporations in utility, user-base, and network valuation due to their superior incentivization structure, flexibility, transparency, resiliency, and distributed nature.” Smart Contracts and EVM Another feature that Ethereum boasts over Bitcoin is a smart contract. A smart contract works like a traditional contract. You can use it to perform a task or transfer money in return for any asset or task in an efficient manner without needing interference from a middleman. Though Bitcoin is fast, secure, and saves cost it has limitations in terms of the ability to run operations. Ethereum solves this problem by allowing operations to work as a contract by converting them to pieces of code and have them supervised by a network of computers. A tool that helps Ethereum developers build and experiment with different contracts is Ethereum Virtual Machine. It acts as a testing environment to build blockchain operations and is isolated from the main network. Thus, it gives developers a perfect platform to build and test smart as well as robust contracts across different industries. DAOs One can also create Decentralized Autonomous Organizations (DAO) using Ethereum. DAO eliminates the need for human managerial involvement. The organization runs through smart contracts that convert rules, core tasks and structure of the organization to codes monitored by a fault-tolerant network. An example of DAO is Slock.it, a DAO version of Airbnb. Performance An important factor for cryptocurrency transaction is the amount of time it takes to finalize the transaction. This is called as Block Time. In terms of performance, the Bitcoin network takes 10 minutes to make a transaction whereas Ethereum is much more efficient and boasts a block time of just 14-15 seconds. Development Ethereum’s programming language Solidity is based on JavaScript. This is great for web developers who want to use their knowledge of JavaScript to build cool DApps and extend the Ethereum platform. Moreover, Ethereum is Turing complete, meaning it can compute anything that is computable provided enough resources are available. Bitcoin, on the other hand, is based on C++ which comparatively is not a popular choice among the new generation of app developers. Community and Vision One can say Bitcoin works as a DAO with no involvement of individuals in managing the cryptocurrency and is completely decentralized and owned by the community. Satoshi Nakamoto, who prefers to stay behind the curtains, is the only name that one comes across when it comes to relating an individual with Bitcoin. The community, therefore, lacks a figurehead when it comes to seeking future directions. Meanwhile, Vitalik Buterin is hugely popular amongst Ethereum enthusiasts and is very much involved in designing the future roadmap with other co-founders. Cryptocurrency Supply Similar to Bitcoin, Ethereum has Ether which works as a digital asset that fuels the network and transactions performed on the platform. Bitcoin has a fixed supply cap of around 21 million coins. It’s going to take more than 100 years to mine the last Bitcoin after which Bitcoin would behave as a deflationary cryptocurrency. Ethereum, on the other hand, has no fixed supply cap but has restricted its annual supply to 18 million Ethers. With no upper cap on the number of Ether that can be mined, Ethereum behaves as an inflationary currency and may lose value with time. However, the Ethereum community is now planning to move from proof-of-work to proof-of-stake model which should limit the number of ethers being mined and also offer benefits such as energy efficiency and security. Some real-world applications using Ethereum The Decentralized applications’ growth has been on the rise with people starting to recognize the value offered by Blockchain and decentralization such as security, immutability, tamper-proofing, and much more. While Bitcoin uses blockchain purely as a list of transactions, Ethereum manages to transfer value and information through its platform. Thus, it allows for immense possibilities when it comes to building different DApps across a wide range of industries. The financial domain is obviously where Ethereum is finding a lot of traction. Projects such as Branche - a Decentralized Consumer Micro­credit and Financial Services and Augur, a decentralized prediction market that has raised more than $ 5 million are some prominent examples. But financial applications are only the tip of the iceberg when it comes to possibilities that Ethereum offers and potential it holds when it comes disrupting industries across various sectors. Some other sectors where Ethereum is making its presence felt are: Firstblood is a decentralized eSports platform which has raised more than $5.5 million. It allows players to test their skills and bet using Ethereum while the tournaments are tracked on smart contracts and blockchain. Alice.Si a charitable trust that lets donors invest in noble causes knowing the fact that they only pay for causes where the charity makes an impact. Chainy is an Ethereum-based authentication and verification system that permanently stores records on blockchain using timestamping. Flippening is happening! If you haven’t heard of Flippening, it’s a term coined by cryptocurrency enthusiasts on Ethereum chances of beating Bitcoin to claim the number one spot to become the largest capitalized blockchain. Comparing Ethereum to Bitcoin may not be right as both serve different purposes. Bitcoin will continue to dominate cryptocurrency but as more industries adopt Ethereum to build Smart Contracts, DApps, or DAOs of their choice, its popularity is only going to grow, subsequently making Ether more valuable. Thus, the possibility of Ether displacing Bitcoin is strong. With the pace at which Ethereum is growing and the potential it holds in terms of unleashing Blockchain’s power to transform industries, it is definitely a question of when rather than if Flippening would happen!
Read more
  • 0
  • 0
  • 20994

article-image-self-service-analytics-changing-modern-day-businesses
Amey Varangaonkar
20 Nov 2017
6 min read
Save for later

How self-service analytics is changing modern-day businesses

Amey Varangaonkar
20 Nov 2017
6 min read
To stay competitive in today’s economic environment, organizations can no longer be reliant on just their IT team for all their data consumption needs. At the same time, the need to get quick insights to make smarter and more accurate business decisions is now stronger than ever. As a result, there has been a sharp rise in a new kind of analytics where the information seekers can themselves create and access a specific set of reports and dashboards - without IT intervention. This is popularly termed as Self-service Analytics. Per Gartner, Self-service analytics is defined as: “A  form of business intelligence (BI) in which line-of-business professionals are enabled and encouraged to perform queries and generate reports on their own, with nominal IT support.” Expected to become a $10 billion market by 2022, self-service analytics is characterized by simple, intuitive and interactive BI tools that have basic analytic and reporting capabilities with a focus on easy data access. It empowers business users to access relevant data and extract insights from it without needing to be an expert in statistical analysis or data mining. Today, many tools and platforms for self-service analytics are already on the market - Tableau, Microsoft Power BI, IBM Watson, Qlikview and Qlik Sense being some of the major ones. Not only have these empowered users to perform all kinds of analytics with accuracy, but their reasonable pricing, in-tool guidance and the sheer ease of use have also made them very popular among business users. Rise of the Citizen Data Scientist The rise in popularity of self-service analytics has led to the coining of a media-favored term - ‘Citizen Data Scientist’. But what does the term mean? Citizen data scientists are business users and other professionals who can perform less intensive data-related tasks such as data exploration, visualization and reporting on their own using just the self-service BI tools. If Gartner’s predictions are to be believed, there will be more citizen data scientists in 2019 than the traditional data scientists who will be performing a variety of analytics-related tasks. How Self-service Analytics benefits businesses Allowing the end-users within a business to perform their own analysis has some important advantages as compared to using the traditional BI platforms: The time taken to arrive at crucial business insights is drastically reduced. This is because teams don’t have to rely on the IT team to deliver specific reports and dashboards based on the organizational data. Quicker insights from self-service BI tools mean businesses can take decisions faster with higher confidence and deploy appropriate strategies to maximize business goals. Because of the relative ease of use, business users can get up to speed with the self-service BI tools/platform in no time and with very little training as compared to being trained on complex BI solutions. This means relatively lower training costs and democratization of BI analytics which in turn reduces the workload on the IT team and allows them to focus on their own core tasks. Self-service analytics helps the users to manage the data from disparate sources more efficiently, thus allowing organizations to be agiler in terms of handling new business requirements. Challenges in Self-service analytics While the self-service analytics platforms offer many benefits, they come with their own set of challenges too.  Let’s see some of them: Defining a clear role for the IT team within the business by addressing concerns such as: Identifying the right BI tool for the business - Among the many tools out there, identifying which tool is the best fit is very important. Identifying which processes and business groups can make the best use of self-service BI and who may require assistance from IT Setting up the right infrastructure and support system for data analysis and reporting Answering questions like - who will design complex models and perform high-level data analysis Thus, rather than becoming secondary to the business, the role of the IT team becomes even more important when adopting a self-service business intelligence solution. Defining a strict data governance policy - This is a critical task as unauthorized access to organizational data can be detrimental to the business. Identifying the right ‘power users’, i.e., the users who need access to the data and the tools, the level of access that needs to be given to them, and ensuring the integrity and security of the data are some of the key factors that need to be kept in mind. The IT team plays a major role in establishing strict data governance policies and ensuring the data is safe, secure and shared across the right users for self-service analytics. Asking the right kind of questions on the data - When users who aren’t analysts get access to data and the self-service tools, asking the right questions of the data in order to get useful, actionable insights from it becomes highly important. Failure to perform correct analysis can result in incorrect or insufficient findings, which might lead to wrong decision-making. Regular training sessions and support systems in place can help a business overcome this challenge. To read more about the limitations of self-service BI, check out this interesting article. In Conclusion IDC has predicted that spending on self-service BI tools will grow 2.5 times than spending on traditional IT-controlled BI tools by 2020. This is an indicator that many organizations worldwide and of all sizes will increasingly believe that self-service analytics is a feasible and profitable way to go forward. Today mainstream adoption of self-service analytics still appears to be in the early stages due to a general lack of awareness among businesses. Many organizations still depend on the IT team or an internal analytics team for all their data-driven decision-making tasks. As we have already seen, this comes with a lot of limitations - limitations that can easily be overcome by the adoption of a self-service culture in analytics, and thus boost the speed, ease of use and quality of the analytics. By shifting most of the reporting work to the power users,  and by establishing the right data governance policies, businesses with a self-service BI strategy can grow a culture that fuels agile thinking, innovation and thus is ready for success in the marketplace. If you’re interested in learning more about popular self-service BI tools, these are some of our premium products to help you get started:   Learning Tableau 10 Tableau 10 Business Intelligence Cookbook Learning IBM Watson Analytics QlikView 11 for Developers Microsoft Power BI Cookbook    
Read more
  • 0
  • 0
  • 20988

article-image-8-myths-rpa-robotic-process-automation
Savia Lobo
08 Nov 2017
9 min read
Save for later

8 Myths about RPA (Robotic Process Automation)

Savia Lobo
08 Nov 2017
9 min read
Many say we are on the cusp of the fourth industrial revolution that promises to blur the lines between the real, virtual and the biological worlds. Amongst many trends, Robotic Process Automation (RPA) is also one of those buzzwords surrounding the hype of the fourth industrial revolution. Although poised to be a $6.7 trillion industry by 2025, RPA is shrouded in just as much fear as it is brimming with potential. We have heard time and again how automation can improve productivity, efficiency, and effectiveness while conducting business in transformative ways. We have also heard how automation and machine-driven automation, in particular, can displace humans and thereby lead to a dystopian world. As humans, we make assumptions based on what we see and understand. But sometimes those assumptions become so ingrained that they evolve into myths which many start accepting as facts. Here is a closer look at some of the myths surrounding RPA. [dropcap]1[/dropcap] RPA means robots will automate processes The term robot evokes in our minds a picture of a metal humanoid with stiff joints that speaks in a monotone. RPA does mean robotic process automation. But the robot doing the automation is nothing like the ones we are used to seeing in the movies. These are software robots that perform routine processes within organizations. They are often referred to as virtual workers/digital workforce complete with their own identity and credentials. They essentially consist of algorithms programmed by RPA developers with an aim to automate mundane business processes. These processes are repetitive, highly structured, fall within a well-defined workflow, consist of a finite set of tasks/steps and may often be monotonous and labor intensive. Let us consider a real-world example here - Automating the invoice generation process. The RPA system will run through all the emails in the system, and download the pdf files containing details of the relevant transactions. Then, it would fill a spreadsheet with the details and maintain all the records therein. Later, it would log on to the enterprise system and generate appropriate invoice reports for each detail in the spreadsheet. Once the invoices are created, the system would then send a confirmation mail to the relevant stakeholders. Here, the RPA user will only specify the individual tasks that are to be automated, and the system will take care of the rest of the process. So, yes, while it is true that RPA involves robots automating processes, it is a myth that these robots are physical entities or that they can automate all processes. [dropcap]2[/dropcap] RPA is useful only in industries that rely heavily on software “Almost anything that a human can do on a PC, the robot can take over without the need for IT department support.” - Richard Bell, former Procurement Director at Averda RPA is a software which can be injected into a business process. Traditional industries such as banking and finance, healthcare, manufacturing etc that have significant tasks that are routine and depend on software for some of their functioning can benefit from RPA. Loan processing and patient data processing are some examples. RPA, however, cannot help with automating the assembly line in a manufacturing unit or with performing regular tests on patients. Even in industries that maintain daily essential utilities such as cooking gas, electricity, telephone services etc RPA can be put to use for generating automated bills, invoices, meter-readings etc. By adopting RPA, businesses irrespective of the industry they belong to can achieve significant cost savings, operational efficiency, and higher productivity. To leverage the benefits of RPA, rather than understanding the SDLC process, it is important that users have a clear understanding of business workflow processes and domain knowledge. Industry professionals can be easily trained on how to put RPA into practice. The bottom line - RPA is not limited to industries that rely heavily on software to exist. But it is true that RPA can be used only in situations where some form of software is used to perform tasks manually. [dropcap]3[/dropcap] RPA will replace humans in most frontline jobs Many organizations employ a large workforce in frontline roles to do routine tasks such as data entry operations, managing processes, customer support, IT support etc. But frontline jobs are just as diverse as the people performing them. Take sales reps for example. They bring new business through their expert understanding of the company’s products, their potential customer base coupled with the associated soft skills. Currently, they spend significant time on administrative tasks such as developing and finalizing business contracts, updating the CRM database, making daily status reports etc. Imagine the spike in productivity if these aspects could be taken off the plates of sales reps and they could just focus on cultivating relationships and converting leads. By replacing human efforts in mundane tasks within frontline roles, RPA can help employees focus on higher value-yielding tasks. In conclusion, RPA will not replace humans in most frontline jobs. It will, however, replace humans in a few roles that are very rule-based and narrow in scope such as simple data entry operators or basic invoice processing executives. In most frontline roles like sales or customer support, RPA is quite likely to change significantly at least in some ways how one sees their job responsibilities. Also, the adoption of RPA will generate new job opportunities around the development, maintenance, and sale of RPA based software. [dropcap]4[/dropcap] Only large enterprises can afford to deploy RPA The cost of implementing and maintaining the RPA software and training employees to use it can be quite high. This can make it an unfavorable business proposition for SMBs with fairly simple organizational processes and cross-departmental considerations. On the other hand, large organizations with higher revenue generation capacity, complex business processes, and a large army of workers can deploy an RPA system to automate high-volume tasks quite easily and recover that cost within a few months.   It is obvious that large enterprises will benefit from RPA systems due to the economies of scale offered and faster recovery of investments made. SMBs (Small to medium-sized businesses) can also benefit from RPA to automate their business processes. But this is possible only if they look at RPA as a strategic investment whose cost will be recovered over a longer time period of say 2-4 years. [dropcap]5[/dropcap] RPA adoption should be owned and driven by the organization's IT department The RPA team handling the automation process need not be from the IT department. The main role of the IT department is providing necessary resources for the software to function smoothly. An RPA reliability team which is trained in using RPA tools does not include IT professionals but rather business operations professionals. In simple terms, RPA is not owned by the IT department but by the whole business and is driven by the RPA team. [dropcap]6[/dropcap] RPA is an AI virtual assistant specialized to do a narrow set of tasks An RPA bot performs a narrow set of tasks based on the given data and instructions. It is a system of rule-based algorithms which can be used to capture, process and interpret streams of data, trigger appropriate responses and communicate with other processes. However, it cannot learn on its own - a key trait of an AI system. Advanced AI concepts such as reinforcement learning and deep learning are yet to be incorporated in robotic process automation systems. Thus, an RPA bot is not an AI virtual assistant, like Apple’s Siri, for example. That said, it is not impractical to think that in the future, these systems will be able to think on their own, decide the best possible way to execute a business process and learn from its own actions to improve the system. [dropcap]7[/dropcap] To use the RPA software, one needs to have basic programming skills Surprisingly, this is not true. Associates who use the RPA system need not have any programming knowledge. They only need to understand how the software works on the front-end, and how they can assign tasks to the RPA worker for automation. On the other hand, RPA system developers do require some programming skills, such as knowledge of scripting languages. Today, there are various platforms for developing RPA tools such as UIPath, Blueprism and more, which empower RPA developers to build these systems without any hassle, reducing their coding responsibilities even more. [dropcap]8[/dropcap] RPA software is fully automated and does not require human supervision This is a big myth. RPA is often misunderstood as a completely automated system. Humans are indeed required to program the RPA bots, to feed them tasks for automation and to manage them. The automation factor here lies in aggregating and performing various tasks which otherwise would require more than one human to complete. There’s also the efficiency factor which comes into play - the RPA systems are fast, and almost completely avoid faults in the system or the process that are otherwise caused due to human error. Having a digital workforce in place is far more profitable than recruiting human workforce. Conclusion One of the most talked about areas in terms of technological innovations, RPA is clearly still in its early days and is surrounded by a lot of myths. However, there’s little doubt that its adoption will take off rapidly as RPA systems become more scalable, more accurate and deploy faster. AI, cognitive, and Analytics-driven RPA will take it up a notch or two, and help the businesses improve their processes even more by taking away dull, repetitive tasks from the people. Hype can get ahead of the reality, as we've seen quite a few times - but RPA is an area definitely worth keeping an eye on despite all the hype.
Read more
  • 0
  • 0
  • 20308
article-image-10-to-dos-for-industrial-internet-architects
Aaron Lazar
24 Jan 2018
4 min read
Save for later

10 To-dos for Industrial Internet Architects

Aaron Lazar
24 Jan 2018
4 min read
[box type="note" align="" class="" width=""]This is a guest post by Robert Stackowiak, a technology business strategist at the Microsoft Technology Center. Robert has co-authored the book Architecting the Industrial Internet with Shyam Nath who is the director of technology integrations for Industrial IoT at GE Digital. You may also check out our interview with Shyam for expert insights into the world of IIoT, Big Data, Artificial Intelligence and more.[/box] Just about every day, one can pick up a technology journal or view an on-line technology article about what is new in the Industrial Internet of Things (IIoT). These articles usually provide insight into IIoT solutions to business problems or how a specific technology component is evolving to provide a function that is needed. Various industry consortia, such as the Industrial Internet Consortium (IIC), provides extremely useful documentation in defining key aspects of the IIoT architecture that the architect must consider. These broad reference architecture patterns have also begun to consistently include specific technologies and common components. The authors of the title Architecting the Industrial Internet felt the time was right for a practical guide for architects.The book provides guidance on how to define and apply an IIoT architecture in a typical project today by describing architecture patterns. In this article, we explore ten to-dos for Industrial Internet Architects designing these solutions. Just as technology components are showing up in common architecture patterns, their justification and use cases are also being discovered through repeatable processes. The sponsorship and requirements for these projects are almost always driven by leaders in the line of business in a company. Techniques for uncovering these projects can be replicated as architects gain needed discovery skills. Industrial Internet Architects To-dos: Understand IIoT: Architects first will seek to gain an understanding of what is different about the Industrial Internet, the evolution to specific IIoT solutions, and how legacy technology footprints might fit into that architecture. Understand IIoT project scope and requirements: They next research guidance from industry consortia and gather functional viewpoints. This helps to better understand the requirements their architecture must deliver solutions to, and the scope of effort they will face. Act as a bridge between business and technical requirements: They quickly come to realize that since successful projects are driven by responding to business requirements, the architect must bridge the line of business and IT divide present in many companies. They are always on the lookout for requirements and means to justify these projects. Narrow down viable IIoT solutions: Once the requirements are gathered and a potential project appears to be justifiable, requirements and functional viewpoints are aligned in preparation for defining a solution. Evaluate IIoT architectures and solution delivery models: Time to market of a proposed Industrial Internet solution is often critical to business sponsors. Most architecture evaluations include consideration of applications or pseudo-applications that can be modified to deliver the needed solution in a timely manner. Have a good grasp of IIoT analytics: Intelligence delivered by these solutions is usually linked to the timely analysis of data streams and care is taken in defining Lambda architectures (or Lambda variations) including machine learning and data management components and where analysis and response must occur. Evaluate deployment options: Technology deployment options are explored including the capabilities of proposed devices, networks, and cloud or on-premises backend infrastructures. Assess IIoT Security considerations: Security is top of mind today and proper design includes not only securing the backend infrastructure, but also extends to securing networks and the edge devices themselves. Conform to Governance and compliance policies: The viability of the Industrial Internet solution can be determined by whether proper governance is put into place and whether compliance standards can be met. Keep up with the IIoT landscape: While relying on current best practices, the architect must keep an eye on the future evaluating emerging architecture patterns and solutions. [author title="Author’s Bio" image="http://"]Robert Stackowiak is a technology business strategist at the Microsoft Technology Center in Chicago where he gathers business and technical requirements during client briefings and defines Internet of Things and analytics architecture solutions, including those that reside in the Microsoft Azure cloud. He joined Microsoft in 2016 after a 20-year stint at Oracle where he was Executive Director of Big Data in North America. Robert has spoken at industry conferences around the world and co-authored many books on analytics and data management including Big Data and the Internet of Things: Enterprise Architecture for A New Age, published by Apress, five editions of Oracle Essentials, published by O'Reilly Media, Oracle Big Data Handbook, published by Oracle Press, Achieving Extreme Performance with Oracle Exadata, published by Oracle Press, and Oracle Data Warehousing and Business Intelligence Solutions, published by Wiley. You can follow him on Twitter at @rstackow. [/author]  
Read more
  • 0
  • 0
  • 19581

article-image-6-reasons-to-choose-mysql-8-for-designing-database-solutions
Amey Varangaonkar
08 May 2018
4 min read
Save for later

6 reasons to choose MySQL 8 for designing database solutions

Amey Varangaonkar
08 May 2018
4 min read
Whether you are a standalone developer or an enterprise consultant, you would obviously choose a database that provides good benefits and results when compared to other related products. MySQL 8 provides numerous advantages as the first choice in this competitive market. It has various powerful features available that make it a more comprehensive database. Today we will go through the benefits of using MySQL as the preferred database solution: [box type="note" align="" class="" width=""]The following excerpt is taken from the book MySQL 8 Administrator’s Guide, co-authored by Chintan Mehta, Ankit Bhavsar, Hetal Oza and Subhash Shah. This book presents step-by-step techniques on managing, monitoring and securing the MySQL database without any hassle.[/box] Security The first thing that comes to mind is securing data because nowadays data has become precious and can impact business continuity if legal obligations are not met; in fact, it can be so bad that it can close down your business in no time. MySQL is the most secure and reliable database management system used by many well-known enterprises such as Facebook, Twitter, and Wikipedia. It really provides a good security layer that protects sensitive information from intruders. MySQL gives access control management so that granting and revoking required access from the user is easy. Roles can also be defined with a list of permissions that can be granted or revoked for the user. All user passwords are stored in an encrypted format using plugin-specific algorithms. Scalability Day by day, the mountain of data is growing because of extensive use of technology in numerous ways. Because of this, load average is going through the roof. In some cases, it is unpredictable that data cannot exceed up to some limit or number of users will not go out of bounds. Scalable databases would be a preferable solution so that, at any point, we can meet unexpected demands to scale. MySQL is a rewarding database system for its scalability, which can scale horizontally and vertically; in terms of data, spreading database and load of application queries across multiple MySQL servers is quite feasible. It is pretty easy to add horsepower to the MySQL cluster to handle the load. An open source relational database management system MySQL is an open source database management system that makes debugging, upgrading, and enhancing the functionality fast and easy. You can view the source and make the changes accordingly and use it in your own way. You can also distribute an extended version of MySQL, but you will need to have a license for this. High performance MySQL gives high-speed transaction processing with optimal speed. It can cache the results, which boosts read performance. Replication and clustering make the  system scalable for more concurrency and manages the heavy workload. Database indexes also accelerate the performance of SELECT query statements for substantial amount of data. To enhance performance, MySQL 8 has included indexes in performance schema to speed up data retrieval. High availability Today, in the world of competitive marketing, an organization's key point is to have their system up and running. Any failure or downtime directly impacts business and revenue; hence, high availability is a factor that cannot be overlooked. MySQL is quite reliable and has constant availability using cluster and replication configurations. Cluster servers instantly handle failures and manage the failover part to keep your system available almost all the time. If one  server gets down, it will redirect the user's request to another node and perform the requested operation. Cross-platform capabilities MySQL provides cross-platform flexibility that can run on various platforms such as Windows, Linux, Solaris, OS2, and so on. It has great API support  for the all  major languages, which makes it very easy to integrate with languages such as  PHP, C++, Perl,  Python, Java, and so on. It is also part of the Linux Apache  MySQL PHP (LAMP) server that is used worldwide for web applications. That’s it then! We discussed few important reasons of MySQL being the most popular relational database in the world and is widely adopted across many enterprises. If you want to learn more about MySQL’s administrative features, make sure to check out the book MySQL 8 Administrator’s Guide today! 12 most common MySQL errors you should be aware of Top 10 MySQL 8 performance benchmarking aspects to know
Read more
  • 0
  • 0
  • 18996

article-image-introducing-intelligent-apps-a-smarter-way-into-the-future
Amarabha Banerjee
19 Oct 2017
6 min read
Save for later

Introducing Intelligent Apps

Amarabha Banerjee
19 Oct 2017
6 min read
We are a species obsessed with ‘intelligence’ since gaining consciousness. We have always been inventing ways to make our lives better through sheer imagination and application of our intelligence. Now, it comes as no surprise that we want our modern day creations to be smart as well - be it a web app or a mobile app. The first question that comes to mind then is what makes an application ‘intelligent’? A simple answer for budding developers is that intelligent apps are apps that can take intuitive decisions or provide customized recommendations/experience to their users based on insights drawn from data collected from their interaction with humans. This brings up a whole set of new questions: How can intelligent apps be implemented, what are the challenges, what are the primary application areas of these so-called Intelligent apps, and so on. Let’s start with the first question. How can intelligence be infused into an app? The answer has many layers just like an app does. The monumental growth in data science and its underlying data infrastructure has allowed machines to process, segregate and analyze huge volumes of data in limited time. Now, it looks set to enable machines to glean meaningful patterns and insights from the very same data. One such interesting example is predicting user behavior patterns. Like predicting what movies or food or brand of clothing the user might be interested in, what songs they might like to listen to at different times of their day and so on. These are, of course, on the simpler side of the spectrum of intelligent tasks that we would like our apps to perform. Many apps currently by Amazon, Google, Apple, and others are implementing and perfecting these tasks on a day-to-day basis. Complex tasks are a series of simple tasks performed in an intelligent manner. One such complex task would be the ability to perform facial recognition, speech recognition and then use it to perform relevant daily tasks, be it at home or in the workplace. This is where we enter the realm of science fiction where your mobile app would recognise your voice command while you are driving back home and sends automated instructions to different home appliances, like your microwave, AC, and your PC so that your food is served hot when you reach home, your room is set at just the right temperature and your PC has automatically opened the next project you would like to work on. All that happens while you enter your home keys-free thanks to a facial recognition software that can map your face and ID you with more than 90% accuracy, even in low lighting conditions. APIs like IBM Watson, AT&T Speech, Google Speech API, the Microsoft Face API and some others provide developers with tools to incorporate features such as those listed above, in their apps to create smarter apps. It sounds almost magical! But is it that simple? This brings us to the next question. What are some developmental challenges for an intelligent app? The challenges are different for both web and mobile apps. Challenges for intelligent web apps For web apps, choosing the right mix of algorithms and APIs that can implement your machine learning code into a working web app, is the primary challenge. plenty of Web APIs like IBM Watson, AT&T speech etc. are available to do this. But not all APIs can perform all the complex tasks we discussed earlier. Suppose you want an app that successfully performs both voice and speech recognition and then also performs reinforcement learning by learning from your interaction with it. You will have to use multiple APIs to achieve this. Their integration into a single app then becomes a key challenge. Here is why. Every API has its own data transfer protocols and backend integration requirements and challenges. Thus, our backend requirement increases significantly, both in terms of data persistence and dynamic data availability and security. Also, the fact that each of these smart apps would need customized user interface designs, poses a challenge to the front end developer. The challenge is to make a user interface so fluid and adaptive that it supports the different preferences of different smart apps. Clearly, putting together a smart web app is no child’s play. That’s why, perhaps, smart voice-controlled apps like Alexa are still merely working as assistants and providing only predefined solutions to you. Their ability to execute complex voice-based tasks and commands is fairly low, let alone perform any non-voice command based task. Challenges for intelligent mobile apps For intelligent mobile apps, the challenges are manifold. A key reason is network dependency for data transfer. Although the advent of 4G and 5G mobile networks has greatly improved mobile network speed, the availability of network and the data transfer speeds still pose a major challenge. This is due to the high volumes of data that intelligent mobile apps require to perform efficiently. To circumvent this limitation, vendors like Google are trying to implement smarter APIs in the mobile’s local storage. But this approach requires a huge increase in the mobile chip’s computation capabilities - something that’s not currently available. Maybe that’s why Google has also hinted at jumping into the chip manufacturing business if their computation needs are not met. Apart from these issues, running multiple intelligent apps at the same time would also require a significant increase in the battery life of mobile devices. Finally, comes the last question. What are some key applications of intelligent apps? We have explored some areas of application in the previous sections keeping our focus on just web and mobile apps. Broadly speaking, whatever makes our daily life easier, is ideally a potential application area for intelligent apps. From controlling the AC temperature automatically to controlling the oven and microwave remotely using the vacuum cleaner (of course the vacuum cleaner has to have robotic AI capabilities) to driving the car, everything falls in the domain of intelligent apps. The real questions for us are What can we achieve with our modern computation resources and our data handling capabilities? How can mobile computation capabilities and chip architecture be improved drastically so that we can have smart apps perform complex tasks faster and ease our daily workflow? Only the future holds the answer. We are rooting for the day when we will rise to become a smarter race by delegating lesser important yet intelligent tasks to our smarter systems by creating intelligent web and mobile apps efficiently and effectively. The culmination of these apps along with hardware driven AI systems could eventually lead to independent smart systems - a topic we will explore in the coming days.
Read more
  • 0
  • 0
  • 18787
article-image-devops-for-big-data-success
Ashwin Nair
11 Oct 2017
5 min read
Save for later

DevOps might be the key to your Big Data project success

Ashwin Nair
11 Oct 2017
5 min read
So, you probably believe in the power of Big Data and the potential it has to change the world. Your company might have already invested in or is planning to invest in a big data project. That’s great! But what if I were to tell you that only 15% of the business were successfully able to deploy their Big Data projects to production. That can’t be a good sign surely! Now, don’t just go freeing up your Big Data budget. Not yet. Big Data’s Big Challenges For all the hype around Big Data, research suggests that many organizations are failing to leverage its opportunities properly. A recent survey by NewVantage partners, for example, explored the challenges facing organizations currently running their own Big Data projects or trying to adopt them. Here’s what they had to say: “In spite of the successes, executives still see lingering cultural impediments as a barrier to realizing the full value and full business adoption of Big Data in the corporate world. 52.5% of executives report that organizational impediments prevent realization of broad business adoption of Big Data initiatives. Impediments include lack or organizational alignment, business and/or technology resistance, and lack of middle management adoption as the most common factors. 18% cite lack of a coherent data strategy.”   Clearly, even some of the most successful organizations are struggling to get a handle on Big Data. Interestingly, it’s not so much about gaps in technology or even skills, but rather lack of culture and organizational alignment that’s making life difficult. This isn’t actually that surprising. The problem of managing the effects of technological change is one that goes far beyond Big Data - it’s impacting the modern workplace in just about every department, from how people work together to how you communicate and sell to customers. DevOps Distilled It’s out of this scenario that we’ve seen the irresistible rise of DevOps. DevOps, for the uninitiated, is an agile methodology that aims to improve the relationship between development and operations. It aims to ensure a fluid collaboration between teams; with a focus on automating and streamlining monotonous and repetitive tasks within a given development lifecycle, thus reducing friction and saving time. We can perhaps begin to see, then, that this approach - usually used in typical software development scenarios - might actually offer a solution to some of the problems faced when it comes to big data. A typical Big Data project Like a software development project, a Big Data project will have multiple different teams working on it in isolation. For example, a big data architect will look into the project requirements and design a strategy and roadmap for implementation, while the data storage and admin team will be dedicated to setting up a data cluster and provisioning infrastructure. Finally, you’ll probably then find data analysts who process, analyse and visualize data to gain insights. Depending on the scope and complexity of your project it is possible that more teams are brought in - say, data scientists are roped in to trains and build custom machine learning models. DevOps for Big Data: A match made in heaven Clearly, there are a lot of moving parts in a typical Big Data project - each role performing considerably complex tasks. By adopting DevOps, you’ll reduce any silos that exist between these roles, breaking down internal barriers and embedding Big Data within a cross-functional team. It’s also worth noting that this move doesn’t just give you a purely operational efficiency advantage - it also gives you much more control and oversight over strategy. By building a cross-functional team, rather than asking teams to collaborate across functions (sounds good in theory, but it always proves challenging), there is a much more acute sense of a shared vision or goal. Problems can be solved together, discussions can take place constantly and effectively. With the operational problems minimized, everyone can focus on the interesting stuff. By bringing DevOps thinking into big data, you also set the foundation for what’s called continuous analytics. Taking the principle of continuous integration, fundamental to effective DevOps practice, whereby code is integrated into a shared repository after every task or change to ensure complete alignment, continuous analytics streamlines the data science lifecycle by ensuring a fully integrated approach to analytics, where as much as possible is automated through algorithms. This takes away the boring stuff - once again ensuring that everyone within the project team can focus on what’s important. We’ve come a long way from Big Data being a buzzword - today, it’s the new normal. If you’ve got a lot of data to work with, to analyze and to understand, you better make sure you’ve the right environment setup to make the most from it. That means there’s no longer an excuse for Big Data projects to fail, and certainly no excuse not to get one up and running. If it takes DevOps to make Big Data work for businesses then it’s a MINDSET worth cultivating and running with.
Read more
  • 0
  • 0
  • 18351

article-image-how-sports-analytics-is-changing-industry
Amey Varangaonkar
14 Nov 2017
7 min read
Save for later

Of perfect strikes, tackles and touchdowns: how analytics is changing sports

Amey Varangaonkar
14 Nov 2017
7 min read
The rise of Big Data and Analytics is drastically changing the landscape of many businesses - and the sports industry is one of them. In today’s age of cut-throat competition, data-based strategies are slowly taking the front seat when it comes to crucial decision making - helping teams gain that decisive edge over their competition.Sports Analytics is slowly becoming the next big thing! In the past, many believed that the key to conquering the opponent in any professional sport is to make the player or the team better - be it making them stronger, faster, or more intelligent.  ‘Analysis’ then was limited to mere ‘clipboard statistics’ and the intuition built by coaches on the basis of raw video footage of games. This is not the case anymore. From handling media contracts and merchandising to evaluating individual or team performance on matchday, analytics is slowly changing the landscape of sports. The explosion of data in sports The amount and quality of information available to decision-makers within the sports organization have increased exponentially over the last two decades. There are several factors contributing to this: Innovation in sports science over the last decade, which has been incredible, to say the least. In-depth records maintained by trainers, coaches, medical staff, nutritionists and even the sales and marketing departments Improved processing power and lower cost of storage allowing for maintaining large amounts of historical data. Of late, the adoption of motion capture technology and wearable devices has proved to be a real game-changer in sports, where every movement on the field can be tracked and recorded. Today, many teams in a variety of sports such as Boston Red Sox and Houston Astros in Major League Baseball (MLB), San Antonio Spurs in NBA and teams like Arsenal, Manchester City and Liverpool FC in football (soccer) are adopting analytics in different capacities. Turning sports data into insights Needless to say, all the crucial sports data being generated today need equally good analytics techniques to extract the most value out of it. This is where Sports Analytics comes into the picture. Sports analytics is defined as the use of analytics on current as well as historical sport-related data to identify useful patterns, which can be used to gain a competitive advantage on the field of play. There are several techniques and algorithms which fall under the umbrella of Sports Analytics. Machine learning, among them, is a widely used set of techniques that sports analysts use to derive insights. It is a popular form of Artificial Intelligence where systems are trained using large datasets to give reliable predictions on random data. With the help of a variety of classification and recommendation algorithms, analysts are now able to identify patterns within the existing attributes of a player, and how they can be best optimized to improve his performance. Using cross-validation techniques, the machine learning models then ensure there is no degree of bias involved, and the predictions can be generalized even in cases of unknown datasets. Analytics is being put to use by a lot of sports teams today, in many different ways. Here are some key use-cases of sports analytics: Pushing the limit: Optimizing player performance Right from tracking an athlete’s heartbeats per minute to finding injury patterns, analytics can play a crucial role in understanding how an individual performs on the field. With the help of video, wearables and sensor data, it is possible to identify exactly when an athlete’s performance drops and corrective steps can be taken accordingly. It is now possible to assess a player’s physiological and technical attributes and work on specific drills in training to push them to an optimal level. Developing search-powered data intelligence platforms seems to be the way forward. The best example for this is Tellius, a search-based data intelligence tool which allows you to determine a player’s efficiency in terms of fitness and performance through search-powered analytics. Smells like team spirit: Better team and athlete management Analytics also helps the coaches manage their team better. For example, Adidas has developed a system called miCoach which works by having the players use wearables during the games and training sessions. The data obtained from the devices highlights the top performers and the ones who need rest. It is also possible to identify and improve patterns in a team’s playing styles, and developing a ‘system’ to improve the efficiency in gameplay. For individual athletes, real-time stats such as speed, heart rate, and acceleration could help the trainers plan the training and conditioning sessions accordingly. Getting intelligent responses regarding player and team performances and real-time in-game tactics is something that will make the coaches’ and management’s life a lot easier, going forward. All in the game: Improving game-day strategy By analyzing the real-time training data, it is possible to identify the fitter, in-form players to be picked for the game. Not just that, analyzing opposition and picking the right strategy to beat them becomes easier once you have the relevant data insights with you. Different data visualization techniques can be used not just with historical data but also with real-time data, when the game is in progress. Splashing the cash: Boosting merchandising What are fans buying once they’re inside the stadium? Is it the home team’s shirt, or is it their scarfs and posters? What food are they eating in the stadium eateries? By analyzing all this data, retailers and club merchandise stores can store the fan-favorite merchandise and other items in adequate quantities, so that they never run out of stock. Analyzing sales via online portals and e-stores also help the teams identify the countries or areas where the buyers live. This is a good indicator for them to concentrate sales and marketing efforts in those regions. Analytics also plays a key role in product endorsements and sponsorships. Determining which brands to endorse, identifying the best possible sponsor, the ideal duration of sponsorship and the sponsorship fee - these are some key decisions that can be taken by analyzing current trends along with the historical data. Challenges in sports analytics Although the advantages offered by analytics are there for all to see, many sports teams have still not incorporated analytics into their day-to-day operations. Lack of awareness seems to be the biggest factor here. Many teams underestimate or still don’t understand, the power of analytics. Choosing the right Big Data and analytics tool is another challenge. When it comes to the humongous amounts of data, especially, the time investment needed to clean and format the data for effective analysis is problematic and is something many teams aren’t interested in. Another challenge is the rising demand for analytics and a sharp deficit when it comes to supply, driving higher salaries. Add to that the need to have a thorough understanding of the sport to find effective insights from data - and it becomes even more difficult to get the right data experts. What next for sports analytics? Understanding data and how it can be used in sports - to improve performance and maximize profits - is now deemed by many teams to be the key differentiator between success and failure. And it’s not just success that teams are after - it’s sustained success, and analytics goes a long way in helping teams achieve that. Gone are the days when traditional ways of finding insights were enough. Sports have evolved, and teams are now digging harder into data to get that slightest edge over the competition, which can prove to be massive in the long run. If you found the article to be insightful, make sure you check out our interview on sports analytics with ESPN Senior Stats Analyst Gaurav Sundararaman.
Read more
  • 0
  • 2
  • 17622
Modal Close icon
Modal Close icon