Data | 0 articles | Tech News, Tutorials & Expert Insights

26 Oct 2017

6 min read

Hyperledger: The Enterprise-ready Blockchain

26 Oct 2017

As one of the most widely discussed phenomena across the global media, Blockchain has certainly grown from just a hype to becoming a mainstream reality. Leading industry experts from finance, supply chain, and IoT are collaborating to make Blockchain available for commercial adoption. But while Blockchain is being projected as the future of digital transactions, it still suffers from two major limitations: carrying out private transactions and scalability. As such, a pressing need to develop a Blockchain-based distributed ledger to overcome these problems was widely felt. Enter Hyperledger Founded by Linux in 2015, Hyperledger aims at providing enterprises a platform to build robust blockchain applications for their businesses and to create open-source enterprise-grade frameworks to carry out secure business transactions. It is a fulcrum, which includes leading industries and software developers working collaboratively for building blockchain frameworks that can further be used to deploy blockchain applications for industries. With leading industry experts such as IBM, Intel, Accenture, SAP, among others collaborating with the Hyperledger community, and with the recent addition of BTS, Oracle, and Patientory Foundation, the community is gaining a lot of traction. No wonder, Brian Behlendorf, Executive Director at Hyperledger, says, “Growth and interest in Hyperledger remain high in 2017”. There are a total of 8 projects: five are frameworks (Sawtooth, Fabric, Burrow, Iroha, and Indy), and the other three are tools (Composer, Cello, and Explorer) supporting those frameworks. Each framework provides a different approach in building desired blockchain applications. Hyperledger Fabric, the community’s first framework, is contributed by IBM. It hosts smart contracts using Chaincode, an interface written in Go or Java, which contains the business logic of the ledger. Hyperledger Sawtooth, developed by Intel offers a modular blockchain architecture. It consists of Proof of Elapsed Time (PoET), a consensus algorithm developed by Intel for high efficiency among distributed ledgers. Hyperledger Burrow, a joint proposal by Intel and Monax, is a permissioned smart contract machine. It executes the smart contract code following the Ethereum specification with an engine, a strong audit trail, and a consensus mechanism. Apart from these already launched frameworks, two more - namely Indy and Iroha, are still in the incubation phase. The Hyperledger community is also building supporting tools such as Composer which is already launched in the market and Cello and Explorer which are awaiting unveiling. [box type="shadow" align="" class="" width=""]Although a plethora of Hyperledger tools and frameworks are available, in the rest of the article we take Hyperledger Fabric - one of the most popular and trending frameworks - for the purpose of demonstrating how Hyperledger is being used by businesses.[/box] Why should businesses use Hyperledger? In order to lock down a framework upon which Blockchain apps can be built, several key aspects are worth considering. Some of the most important ones among them are portability, security, reliability, interoperability, and user-friendliness. Hyperledger as a platform offers all of the above features for building cross-platform and production-ready applications for businesses. Let’s take a simple example here to see how Hyperledger works for businesses. Consider a restaurant business. A restaurant owner buys vegetables from a wholesale shop at a much lower cost than in the market. The shopkeeper creates a network wherein other buyers cannot see the cost at which vegetables are sold to a buyer. Similarly, the restaurant owner can view only his transaction with the shopkeeper. For the vegetables to reach the restaurant, they must pass through numerous stages such as transport, delivery, and so on. The restaurant owner can track the delivery of his vegetables at each stage and so can the shopkeeper. The transport and the delivery organizations, however, won’t be able to see the transaction details. This means that the shopkeeper can establish a confidential network within a private network of other stakeholders. This type of a network can be set up using Hyperledger Fabric. Let’s break down the above example into some of the reasons to consider incorporating Hyperledger for your business networks: With Hyperledger you get performance, scalability, and multiple levels of trust. You get data on a need-to-know basis - Only the parties in the network that need the data get to know about it. Backed by bigshots like Intel and IBM, Hyperledger strives to offer a strong standard for Blockchain code which in turn provides better functionality at increased speeds. Furthermore, with the recent release of Fabric v1.0, businesses can create out-of-the-box blockchain solutions on its highly elastic and extensible architecture further eased by using Hyperledger Composer. The Composer aids businesses in creating smart contracts and blockchain applications without having to know the underlying complex intricacies of the blockchain network. It is a great fit for real-world enterprise usage, built with collaborative efforts from leading industry experts. Although Ethereum is used by many businesses, some of the reasons why Hyperledger could be a better enterprise fit are: While Ethereum is a public Blockchain, Hyperledger is a private blockchain. This means enterprises within the network know who is present on the peer nodes, unlike Ethereum. Hyperledger is a permissioned network i.e., it has the ability to grant permission on who can participate in the consensus mechanism of the Blockchain network. Ethereum, on the other hand, is permissionless. Hyperledger has no built-in cryptocurrency. Ethereum, on the other hand, has a built-in cryptocurrency, called Ether. Many applications don’t need a cryptocurrency to function, and using Ethereum there can be a disadvantage. Hyperledger gives you the flexibility of choosing a programming language such as Java or Go, for preparing smart contracts. Ethereum, on the other hand, uses Solidity which is a lot less common in use. Hyperledger is highly scalable — unlike traditional Blockchain and Ethereum — with minimal performance losses. “Since Hyperledger Fabric was designed to meet key requirements for permissioned blockchains with transaction privacy and configurable policies, we’ve been able to build solutions quickly and flexibly. ” - Mohan Venkataraman, CTO, IT People Corporation. Future of Hyperledger The Hyperledger community is expanding rapidly with many industries collaborating and offering their capabilities in building cross-industry blockchain applications. Hyperledger has found adoption within business networks in varied industries such as healthcare, finance, and supply chain to build state-of-the-art blockchain applications which assure privacy and decentralized permissioned networks. It is shaping up to be a technology which can revolutionize the way businesses deal with different access control within a consortium, with an armor of enhanced security measures. With the continuous developments in these frameworks, smarter, faster, and more secure business transactions will soon be a reality. Besides, we can expect to see Hyperledger on the cloud with IBM’s plans to extend Blockchain technologies onto its cloud. Add to that the exciting prospect of blending aspects of Artificial Intelligence with Hyperledger, transactions look more advanced, tamper-proof, and secure than ever before.

0
0
26817

article-image-4-myths-about-git-and-github-you-should-know-about

Savia Lobo

07 Oct 2018

3 min read

4 myths about Git and GitHub you should know about

Savia Lobo

07 Oct 2018

3 min read

With an aim to replace BitKeeper, Linus Torvalds created Git in 2005 to support the development of the Linux kernel. However, Git isn’t necessarily limited to code, any product or project that requires or exhibits characteristics such as having multiple contributors, requiring release management and versioning stands to have an improved workflow through Git. Just as every solution or tool has its own positives and negatives, Git is also surrounded by myths. Alex Magana and Joseph Mul, the authors of Introduction to Git and GitHub course discuss in this post some of the myths about the Git tool and GitHub. Git is GitHub Due to the usage of Git and GitHub as the complete set that forms the version control toolkit, adopters of the two tools misconceive Git and GitHub as interchangeable tools. Git is a tool that offers the ability to track changes on files that constitute a project. Git offers the utility that is used to monitor changes and persists the changes. On the other hand, GitHub is akin to a website hosting service. The difference here is that with GitHub, the hosted content is a repository. The repository can then be accessed from this central point and the codebase shared. Backups are equivalent to version control This emanates from a misunderstanding of what version control is and by extension what Git achieves when it’s incorporated into the development workflow. Contrary to archives created based on a team’s backup policy, Git tracks changes made to files and maintains snapshots of a repository at a given point in time. Git is only suitable for teams With the usage of hosting services such as GitHub, the element of sharing and collaboration, may be perceived as a preserve of teams. Git offers gains beyond source control. It lends itself to the delivery of a feature or product from the point of development to deployment. This means that Git is a tool for delivery. It can, therefore, be utilized to roll out functionality and manage changes to source code for teams and individuals alike. To effectively use Git, you need to learn every command to work When working as an individual or a team, the common commands required to allow you to contribute a repository encompass commands for initiating tracking of specific files, persisting changes made to tracked files, reverting changes made to files incorporating changes introduced by other developers working on the same project you are on. The four myths mentioned by the authors provides a clarification on both Git and GitHub and its uses. If you found this post useful, do check out the course titled Introduction to Git and GitHub by Alex and Joseph. GitHub addresses technical debt, now runs on Rails 5.2.1 GitLab 11.3 released with support for Maven repositories, protected environments and more GitLab raises $100 million, Alphabet backs it to surpass Microsoft’s GitHub

0
0
26724

article-image-google-opensorced-tensorflow

Kunal Parikh

13 Sep 2017

7 min read

6 reasons why Google open-sourced TensorFlow

Kunal Parikh

13 Sep 2017

7 min read

On November 9, 2015, a storm loomed over the SF Bay area creating major outages. At Mountain View, California, Google engineers were busy creating a storm of their own. That day, Sundar Pichai announced to the world that TensorFlow, their machine learning system, was going Open Source. He said: “...today we’re also open-sourcing TensorFlow. We hope this will let the machine learning community—everyone from academic researchers, to engineers, to hobbyists—exchange ideas much more quickly, through working code rather than just research papers.” That day the tech world may not have fully grasped the gravity of the announcement but those in the know knew it was a pivotal moment in Google’s transformational journey into an AI first world. How did TensorFlow begin? TensorFlow was part of a former Google product called DistBelief. DistBelief was responsible for a program called DeepDream. The program was built for scientists and engineers to visualise how deep neural networks process images. As fate would have it, the algorithm went viral and everyone started visualising abstract and psychedelic art in it. Although people were having fun playing with image forms, they were unaware of the technology that powered those images - neural networks and deep learning - the exact reason why TensorFlow was built for. TensorFlow is a machine learning platform that allows one to run a wide range of algorithms like the aforementioned neural networks and deep learning based projects. TensorFlow with its flexibility, high performance, portability, and production-readiness is changing the landscape of artificial intelligence and machine learning. Be it face recognition, music, and art creation or detecting clickbait headline for blogs, the use cases are immense. With Google open sourcing TensorFlow, the platform that powers Google search and other smart Google products is now accessible to everyone - researchers, scientists, machine learning experts, students, and others. So why did Google open source TensorFlow? Yes, Google made a world of difference to the machine learning community at large by open sourcing TensorFlow. But what was in it for Google? As it turns out, a whole lot. Let’s look at a few. Google is feeling the heat from rival deep learning frameworks Major deep learning frameworks like Theano, Keras, etc., were already open source. Keeping a framework proprietary was becoming a strategic disadvantage as most DL core users i.e. scientists, engineers, and academicians prefer using open source software for their work. “Pure” researchers and aspiring “Phds” are key groups that file major patents in the world of AI. By open sourcing TensorFlow, Google gave this community access to a platform it backs to power their research. This makes migrating the world’s algorithms from other deep learning tools onto TensorFlow theoretically possible. AI as a trend is clearly here to stay and Google wants a platform that leads this trend. An open source TensorFlow can better support the Google Brain project Behind all the PR, Google does not speak much about its pet project Google Brain. When Sundar Pichai talks of Google’s transformation from Search to AI, this project is doing all the work behind the scenes. Google Brain is headed by some of the best minds in the industry like Jeff Dean, Geoffery Hilton, Andrew NG, among many others. They developed TensorFlow and they might still have some state-of-the-art features up their sleeves privy only to them. After all, they have done a plethora of stunning research in areas like parallel computing, machine intelligence, natural language processing and many more. With TensorFlow now open sourced, this team can accelerate the development of the platform and also make significant inroads into areas they are currently researching on. This research can then potentially develop into future products for Google which will allow them to expand their AI and Cloud clout, especially in the enterprise market. Tapping into the collective wisdom of the academic intelligentsia Most innovations and breakthroughs come from universities before they go mainstream and become major products in enterprises. AI, still making this transition, will need a lot of investment in research. To work on difficult algorithms, researchers will need access to sophisticated ML frameworks. Selling TensorFlow to universities is an old school way to solve the problem - that’s why we no longer hear about products like LabView. Instead, by open-sourcing TensorFlow, the team at Google now has the world’s best minds working on difficult AI problems on their platform for free. As these researchers start writing papers on AI using TensorFlow, it will keep adding to the existing body of knowledge. They will have all the access to bleeding-edge algorithms that are not yet available in the market. Their engineers could simply pick and choose what they like and start developing commercially ready services. Google wants to develop TensorFlow as a platform-as-a-service for AI application development An advantage of open-sourcing a tool is that it accelerates time to build and test through collaborative app development. This means most of the basic infrastructure and modules to build a variety of TensorFlow based applications will already exist on the platform. TensorFlow developers can develop and ship interesting modular products by mixing and matching code and providing a further layer of customization or abstraction. What Amazon did for storage with AWS, Google can do for AI with TensorFlow. It won’t come as a surprise if Google came up with their own integrated AI ecosystem with TensorFlow on the Google Cloud promising you the AI resources your company would need. Suppose you want a voice based search function on your ecommerce mobile application. Instead, of completely reinventing the wheel, you could buy TensorFlow powered services provided by Google. With easy APIs, you can get voice based search and save substantial developer cost and time. Open sourcing TensorFlow will help Google to extend their talent pipeline in a competitive Silicon Valley jobs market Hiring for AI development is competitive in the Silicon Valley as all major companies vie for attention from the same niche talent pool. With TensorFlow made freely available, Google’s HR team can quickly reach out to a talent pool specifically well versed with the technology and also save on training cost. Just look at the interest TensorFlow has generated on a forum like StackOverflow: This indicates that growing number of users are asking and inquiring about TensorFlow. Some of these users will migrate into power users who the Google HR team can tap into. A developer pool at this scale would never have been possible with a proprietary tool. Replicating the success and learning from Android Agreed, a direct comparison with Android is not possible. However, the size of the mobile market and Google’s strategic goal of mobile-first when they introduced Android bear striking similarity with the nascent AI ecosystem we have today and Google’s current AI-first rhetoric. In just a decade since its launch, Android now owns more than 85% of the smartphone mobile OS market. Piggybacking on Android’s success, Google now has control of mobile search (96.19%), services (Google Play), a strong connection with the mobile developer community and even a viable entry into the mobile hardware market. Open sourcing Android did not stop Google from making money. Google was able to monetize through other ways like mobile search, mobile advertisements, Google Play, devices like Nexus, mobile payments, etc. Google did not have all this infrastructure planned and ready before Android was open sourced - It innovated, improvised, and created along the way. In the future, we can expect Google to adopt key learnings from its Android growth story and apply to TensorFlow’s market expansion strategy. We can also see supporting infrastructures and models for commercialising TensorFlow emerge for enterprise developers. [dropcap]T[/dropcap]he road to AI world domination for Google is on the back of an open sourced TensorFlow platform. It appears not just exciting but also promises to be one full of exponential growth, crowdsourced innovation and learnings drawn from other highly successful Google products and services. The storm that started two years ago is surely morphing into a hurricane. As Professor Michael Guerzhoy of University of Toronto quotes in Business Insider “Ten years ago, it took me months to do something that for my students takes a few days with TensorFlow.”

0
0
26622

article-image-how-is-artificial-intelligence-changing-the-mobile-developer-role

Bhagyashree R

15 Oct 2018

10 min read

How is Artificial Intelligence changing the mobile developer role?

Bhagyashree R

15 Oct 2018

10 min read

0
0
25873

article-image-how-are-mobile-apps-transforming-the-healthcare-industry

Guest Contributor

15 Jan 2019

5 min read

How are Mobile apps transforming the healthcare industry?

Guest Contributor

15 Jan 2019

5 min read

Mobile App Development has taken over and completely re-written the healthcare industry. According to Healthcare Mobility Solutions reports, the Mobile healthcare application market is expected to be worth more than $84 million by the year 2020. These mobile applications are not just limited to use by patients but are also massively used by doctors and nurses. As technology evolves, it simultaneously opens up the possibility of being used in multiple ways. Similar has been the journey of healthcare mobile app development that has originated from the latest trends in technology and has made its way to being an industry in itself. The technological trends that have helped build mobile apps for the healthcare industry are Blockchain You probably know blockchain technology, thanks to all the cryptocurrency rage in recent years. The blockchain is basically a peer-to-peer database that keeps a verified record of all transactions, or any other information that one needs to track and have it accessible to a large community. The healthcare industry can use a technology that allows it to record the medical history of patients, and store it electronically, in an encrypted form, that cannot be altered or hacked into. Blockchain succeeds where a lot of health applications fail, in the secure retention of patient data. The Internet of Things The Internet of Things (IoT) is all about connectivity. It is a way of interconnecting electronic devices, software, applications, etc., to ensure easy access and management across platforms. The loT will assist medical professionals in gaining access to valuable patient information so that doctors can monitor the progress of their patients. This makes treatment of the patient easier, and more closely monitored, as doctors can access the patient’s current profile anywhere and suggest treatment, medicine, and dosages. Augmented Reality From the video gaming industry, Augmented Reality has made its way to the medical sector. AR refers to the creation of an interactive experience of a real-world environment through superimposition of computer-generated perceptual information. AR is increasingly used to develop mobile applications that can be used by doctors and surgeons as a training experience. It stimulates a real-world experience of diagnosis and surgery, and by doing so, enhances the knowledge and its practical application that all doctors must necessarily possess. This form of training is not limited in nature, and can, therefore, simultaneously train a large number of medical practitioners. Big Data Analytics Big Data has the potential to provide comprehensive statistical information, only accessed and processed through sophisticated software. Big Data Analytics becomes extremely useful when it comes to managing the hospital’s resources and records in an efficient manner. Aside from this, it is used in the development of mobile applications that store all patient data, thus again, eliminating the need for excessive paperwork. This allows medical professionals to focus more on attending and treating the patients, rather than managing database. These technological trends have led to the development of a diverse variety of mobile applications to be used for multiple purposes in the healthcare industry. Listed below are the benefits of the mobile apps deploying these technological trends, for the professionals and the patients alike. Telemedicine Mobile applications can potentially play a crucial role in making medical services available to the masses. An example is an on-call physician on telemedicine duty. A mobile application will allow the physician to be available for a patient consult without having to operate via PC. This will make the doctors more accessible and will bring quality treatment to the patients quickly. Enhanced Patient Engagement There are mobile applications that place all patient data – from past medical history to performance metrics, patient feedback, changes in the treatment patterns and schedules, at the push of a button on the smartphone application for the medical professional to consider and make a decision on the go. Since all data is recorded in real-time, it makes it easy for doctors to change shifts without having to explain to the next doctor the condition of the patient in person. The mobile application has all the data the supervisors or nurses need. Easy Access to Medical Facilities There are a number of mobile applications that allow patients to search for medical professionals in their area, read their reviews and feedback by other patients, and then make an online appointment if they are satisfied with the information that they find. Apart from these, they can also download and store their medical lab reports, and order medicines online at affordable prices. Easy Payment of Bills Like in every other sector, mobile applications in healthcare have made monetary transactions extremely easy. Patients or their family members, no longer need to spend hours waiting in the line to pay the bills. They can instantly pick a payment plan and pay bills immediately or add reminders to be notified when a bill is due. Therefore, it can be safely said that the revolution that the healthcare industry is undergoing and has worked in the favor of all the parties involved – Medical Professionals, Patients, Hospital Management and the Mobile App Developers. Author's Bio Ritesh Patil is the co-founder of Mobisoft Infotech that helps startups and enterprises in mobile technology. He’s an avid blogger and writes on mobile application development. He has developed innovative mobile applications across various fields such as Finance, Insurance, Health, Entertainment, Productivity, Social Causes, Education and many more and has bagged numerous awards for the same. Social Media – Twitter, LinkedIn Healthcare Analytics: Logistic Regression to Reduce Patient Readmissions How IBM Watson is paving the road for Healthcare 3.0 7 Popular Applications of Artificial Intelligence in Healthcare

0
0
25804

article-image-5-reasons-learn-generative-adversarial-networks-gans

Savia Lobo

12 Dec 2017

5 min read

5 reasons to learn Generative Adversarial Networks (GANs) in 2018

Savia Lobo

12 Dec 2017

5 min read

Generative Adversarial Networks (GANs) are a prominent branch of Machine learning research today. As deep neural networks require a lot of data to train on, they perform poorly if data provided is not sufficient. GANs can overcome this problem by generating new and real data, without using the tricks like data augmentation. As the application of GANs in the Machine learning industry is still at the infancy level, it is considered a highly desirable niche skill. Having an added hands-on experience raises the bar higher in the job market. It can fetch you a higher pay over your colleagues and can also be the feature that sets your resume stand apart. Source: Gartner's Hype Cycle 2017 GANs along with CNNs and RNNs are a part of the in demand deep neural network experience in the industry. Here are five reasons why you should learn GANs today and how Kuntal Ganguly’s book, Learning Generative Adversarial Networks help you do just that. Kuntal is a big data analytics engineer at Amazon Web Services. He has around 7 years of experience building large-scale, data-driven systems using big data frameworks and machine learning. He has designed, developed, and deployed several large-scale distributed applications, without any assistance. Kuntal is a seasoned author with a rich set of books ranging across the data science spectrum from machine learning, deep learning, to Generative Adversarial Networks, published under his belt.[/author] The book shows how to implement GANs in your machine learning models in a quick and easy format with plenty of real-world examples and hands-on tutorials. 1. Unsupervised Learning now a cakewalk with GANs A major challenge of unsupervised learning is the massive amount of unlabelled data one needed to work through as part of data preparation. In traditional neural networks, this labeling of data is both costly and time-consuming. A creative aspect of Deep learning is now possible using Generative Adversarial Networks. Here, the neural networks are capable of generating realistic images from the real-world datasets (such as MNIST and CIFAR). GANs provide an easy way to train the DL algorithms. This is done by slashing down the amount of data required to train the neural network models, that too, with no labeling of data required. This book uses a semi-supervised approach to solve the problem of unsupervised learning for classifying images. However, this could be easily leveraged into developer’s own problem domain. 2. GANs help you change a horse into a zebra using Image style transfer https://www.youtube.com/watch?v=9reHvktowLY Turning an apple into an orange is Magic!! GANs can do this magic, without casting a spell. Transferring Image-to-Image style, where the styling of one image is applied to the other. What GANs can do is, they can perform image-to-image translations across various domains (such as changing apple to orange or horse to zebra) using Cycle Consistent Generative Network (Cycle GANs). Detailed examples of how to turn the image of an apple to an orange using TensorFlow, and how of turn an image of a horse into a zebra using a GAN model, are given in this book. 3. GANs inputs your text and outputs an image Generative Adversarial networks can also be utilized for text-to-image synthesis. An example of this is in generating a photo-realistic image based on a caption. To do this, a dataset of images with their associated captions are given as training data. The dataset is first encoded using a hybrid neural network called the character-level convolutional Recurrent Neural network, which creates a joint representation of both in multimodal space for both the generator and the discriminator. In this book, Kuntal showcases the technique of stacking multiple generative networks to generate realistic images from textual information using StackGANs.Further, the book goes on to explain the coupling of two generative networks, to automatically discover relationships among various domains (a relationship between shoes and handbags or actor and actress) using DiscoGANs. 4. GANs + Transfer Learning = No more model generation from scratch Source: Learning Generative Adversarial Networks Data is the basis to train any Machine learning model, scarcity of which can lead to a poorly-trained model, which can have high chances of failure. Some real-life scenarios may not have sufficient data, hardware, or resources to train bigger networks in order to achieve the desired accuracy. So, is training from scratch a must-do for training the models? A well-known technique used in deep learning that adapts an existing trained model for a similar task to the task at hand is known as Transfer Learning. This book will showcase Transfer learning using some hands-on examples. Further, a combination of both Transfer learning and GANs, to generate high-resolution realistic images with facial datasets is explained. Thus, you will also understand how to create creating artistic hallucination on images beyond GAN. 5. GANs help you take Machine Learning models to Production Most Machine learning tutorials, video courses, and books, explain the training and the evaluation of the models. But how do we take this trained model to production and put it to use and make it available to customers? In this book, the author has taken an example, i.e. developing a facial correction system using an LFW dataset, to automatically correct corrupted images using your trained GAN model. This book also contains several techniques of deploying machine learning or deep learning models in production both on data centers and clouds with micro-service based containerized environments.You will also learn the way of running deep models in a serverless environment and with managed cloud services. This article just scratches the surface of what is possible with GANs and why learning it would change your thinking about deep neural networks. To know more grab your copy of Kuntal Ganguly’s book on GANs: Learning Generative Adversarial Networks. .

0
0
25761

article-image-the-rise-of-machine-learning-in-the-investment-industry

Natasha Mathur

15 Feb 2019

13 min read

The rise of machine learning in the investment industry

Natasha Mathur

15 Feb 2019

13 min read

0
0
25725

Savia Lobo

29 Aug 2017

4 min read

How Blockchain can level up IoT Security

Savia Lobo

29 Aug 2017

4 min read

IoT contains hoard of sensors, vehicles and all devices that have embedded electronics which can communicate over the Internet. These IoT enabled devices generate tons of data every second. And with IoT Edge Analytics, these devices are getting much smarter - they can start or stop a request without any human intervention. 25 billion connected ”things” will be connected to the internet by 2020. - Gartner Research With so much data being generated by these devices, the question on everyone’s mind is: Will all this data be reliable and secure? When Brains meet Brawn: Blockchain for IoT Blockchain, an open distributed ledger, is highly secure and difficult to manipulate/corrupt by anyone connected over the network. It was initially designed for cryptocurrency based financial transactions. Bitcoin is a famous example which has Blockchain as its underlying technology. Blockchain has come a long way since then and can now be used to store anything of value. So why not save data in it? And this data will be secure just like every digital asset in a Blockchain is. Blockchain, decentralized and secured, is an ideal structure suited to form the underlying foundation for IoT data solutions. Current IoT devices and their data rely on the client service architecture. All devices are identified, authenticated, and connected via the cloud servers, which are capable of storing ample amount of data. But this requires huge infrastructure, which is all the more expensive. Blockchain not only provides an economical alternative but also since it works in a decentralized fashion it eliminates all single point of failures, creating a much secure and tougher network for IoT devices. This makes IoT more secure and reliable. Customers can therefore relax knowing their information is in safe hands. Today, Blockchain’s capabilities extend beyond processing financial transactions - It can now track billions of connected devices, process transactions and even co-ordinate between devices - a good fit for the IoT industry. Why Blockchain is perfect for IoT Inherent weak security features make IoT devices suspect. On the other hand, Blockchain with its tamper-proof ledger makes it hard to manipulate for malicious activities - thus, making it the right infrastructure for IoT solutions. Enhancing security through decentralization Blockchain makes it hard for intruders to intervene as it spans across a network of secure blocks. Change at a single location, therefore, does not affect the other blocks. The data or any value remains encrypted and is only visible to the person who has encrypted it using a private key. The cryptographic algorithms used in Blockchain technology ensure the IoT data remain private either for an individual organization or for the organizations connected in a network . Simplicity through autonomous 3rd-party-free transactions Blockchain technology is already a star in the finance sector thanks to the adoption of smart contracts, Bitcoin and other cryptocurrencies. Apart from providing a secured medium for financial transactions, it eliminates the need for third-party brokers such as banks to provide guarantee over peer-to-peer payment services. With Blockchain, IoT data can be treated in a similar manner, wherein smart contracts can be made between devices to exchange messages and data. This type of autonomy is possible because each node in the blockchain network can verify the validity of the transaction without relying on a centralized authority. Blockchain backed IoT solutions will thus enable trustworthy message sharing. Business partners can easily access and exchange confidential information within the IoT without a centralized management/regulatory authority. This means quicker transactions, lower costs and lesser opportunities for malicious intent such as data espionage. Blockchain's immutability for predicting IoT security vulnerabilities Blockchains maintain a history of all transactions made by smart devices connected within a particular network. This is possible because once you enter data in a Blockchain, it lives there forever in its immutable ledger. The possibilities for IoT solutions that leverage Blockchain’s immutability are limitless. Some obvious uses cases are more robust credit-scores and preventive health-care solutions that use data accumulated through wearables. For all the above reasons, we see significant Blockchain adoption by IoT based businesses in the near future.

0
0
25683

article-image-why-algorithm-never-win-pulitzer

Richard Gall

21 Jan 2016

6 min read

Why an algorithm will never win a Pulitzer

Richard Gall

21 Jan 2016

6 min read

In 2012, a year which feels a lot like the very early years of the era of data, Wired published this article on Narrative Science, an organization based in Chicago that uses Machine Learning algorithms to write news articles. Its founder and CEO, Kris Hammond, is a man whose enthusiasm for algorithmic possibilities is unparalleled. When asked whether an algorithm would win a Pulitzer in the next 20 years he goes further, claiming that it could happen in the next 5 years. Hammond’s excitement at what his organization is doing is not unwarranted. But his optimism certainly is. Unless 2017 is a particularly poor year for journalism and literary nonfiction, a Pulitzer for one of Narrative Science’s algorithms looks unlikely to say the least. But there are a couple of problems with Hammond’s enthusiasm. He fails to recognise the limitations of algorithms, the fact that the job of even the most intricate and complex Deep Learning algorithm is very specific is quite literally determined by the people who create it. “We are humanising the machine” he’s quoted as saying in a Guardian interview from June 2015. “Based on general ideas of what is important and a close understanding of who the audience is, we are giving it the tools to know how to tell us stories”. It’s important to notice here how he talks - it’s all about what ‘we’re’ doing. The algorithms that are central to Narrative Science’s mission are things that are created by people, by data scientists. It’s easy to read what’s going on as a simple case of the machines taking over. True, perhaps there is cause for concern among writers when he suggests that in 25 years 90% of news stories will be created by algorithms, but in actual fact there’s just a simple shift in where labour is focused. It's time to rethink algorithms We need to rethink how we view and talk about data science, Machine Learning and algorithms. We see, for example, algorithms as impersonal, blandly futuristic things. Although they might be crucial to our personalized online experiences, they are regarded as the hypermodern equivalent of the inauthentic handshake of a door to door salesman. Similarly, at the other end, the process of creating them are viewed as a feat of engineering, maths and statistics nerds tackling the complex interplay of statistics and machinery. Instead, we should think of algorithms as something creative, things that organize and present the world in a specific way, like a well-designed building. If an algorithm did indeed win a Pulitzer, wouldn’t it really be the team behind it that deserves it? When Hammond talks, for example, about “general ideas of what is important and a close understanding who the audience is”, he is referring very much to a creative process. Sure, it’s the algorithm that learns this, but it nevertheless requires the insight of a scientist, an analyst to consider these factors, and to consider how their algorithm will interact with the irritating complexity and unpredictability of reality. Machine Learning projects, then, are as much about designing algorithms as they are programming them. There’s a certain architecture, a politics that informs them. It’s all about prioritization and organization, and those two things aren’t just obvious; they’re certainly not things which can be identified and quantified. They are instead things that inform the way we quantify, the way we label. The very real fingerprints of human imagination, and indeed fallibility are in algorithms we experience every single day. Algorithms are made by people Perhaps we’ve all fallen for Hammond’s enthusiasm. It’s easy to see the algorithms as the key to the future, and forget that really they’re just things that are made by people. Indeed, it might well be that they’re so successful that we forget they’ve been made by anyone - it’s usually only when algorithms don’t work that the human aspect emerges. The data-team have done their job when no one realises they are there. An obvious example: You can see it when Spotify recommends some bizarre songs that you would never even consider listening to. The problem here isn’t simply a technical one, it’s about how different tracks or artists are tagged and grouped, how they are made to fit within a particular dataset that is the problem. It’s an issue of context - to build a great Machine Learning system you need to be alive to the stories and ideas that permeate within the world in which your algorithm operates - if you, as the data scientist lack this awareness, so will your Machine Learning project. But there have been more problematic and disturbing incidents such as when Flickr auto tags people of color in pictures as apes, due to the way a visual recognition algorithm has been trained. In this case, the issue is with a lack of sensitivity about the way in which an algorithm may work - the things it might run up against when it’s faced with the messiness of the real-world, with its conflicts, its identities, ideas and stories. The story of Solid Gold Bomb too, is a reminder of the unintended consequences of algorithms. It’s a reminder of the fact that we can be lazy with algorithms; instead of being designed with thought and care they become a surrogate for it - what’s more is that they always give us a get out clause; we can blame the machine if something goes wrong. If this all sounds like I’m simply down on algorithms, that I’m a technological pessimist, you’re wrong. What I’m trying to say is that it’s humans that are really in control. If an algorithm won a Pulitzer, what would that imply – it would mean the machines have won. It would mean we’re no longer the ones doing the thinking, solving problems, finding new ones. Data scientists are designers As the economy becomes reliant on technological innovation, it’s easy to remove ourselves, to underplay the creative thinking that drives what we do. That’s what Hammond’s doing, in his frenzied excitement about his company - he’s forgetting that it’s him and his team that are finding their way through today’s stories. It might be easier to see creativity at work when we cast our eyes towards game development and web design, but data scientists are designers and creators too. We’re often so keen to stress the technical aspects of these sort of roles that we forget this important aspect of the data scientist skillset.

0
0
25662

article-image-data-science-for-non-techies-how-i-got-started

Amey Varangaonkar

20 Jul 2018

7 min read

Data science for non-techies: How I got started (Part 1)

Amey Varangaonkar

20 Jul 2018

7 min read

As a category manager, I manage the data science portfolio of product ideas for Packt Publishing, a leading tech publisher. In simple terms, I place informed bets on where to invest, what topics to publish on etc. While I have a decent idea of where the industry is heading and what data professionals are looking forward to learn and why etc, it is high time I walked in their shoes for a couple of reasons. Basically, I want to understand the reason behind Data Science being the ‘Sexiest job of the 21st century’, and if the role is really worth all the fame and fortune. In the process, I also wanted to explore the underlying difficulties, challenges and obstacles that every data scientist has had to endure at some point in his/her journey, or still does, maybe. The cherry on top, is that I get to use the skills I develop, to supercharge my success in my current role that is primarily insight-driven. This is the first of a series of posts on how I got started with Data Science. Today, I’m sharing my experience with devising a learning path and then gathering appropriate learning resources. Devising a learning path To understand the concepts of data science, I had to research a lot. There are tons and tons of resources out there, many of which are very good. Once you seperate the good from the rest, it can be quite intimidating to pick the options that suit you the best. Some of the primary questions that clouded my mind were: What should be my programming language of choice? R or Python? Or something else? What tools and frameworks do I need to learn? What about the statistics and mathematical aspects of machine learning? How essential are they? Two videos really helped me find the answers to the questions above: If you don’t want to spend a lot of your time mastering the art of data science, there’s a beautiful video on how to become a data scientist in six months What are the questions asked in a data science interview? What are the in-demand skills that you need to master in order to get a data science job? This video on 5 Tips For Getting a Data Science Job really is helpful. After a lot of research that included reading countless articles and blogs and discussions with experts, here is my learning plan: Learn Python Per the recently conducted Stack Overflow Developer Survey 2018, Python stood out as the most-wanted programming language, meaning the developers who do not use it yet want to learn it the most. As one of the most widely used general-purpose programming languages, Python finds large applications when it comes to data science. Naturally, you get attracted to the best option available, and Python was the one for me. The major reasons why I chose to learn Python over the other programming languages: Very easy to learn: Python is one of the easiest programming languages to learn. Not only is the syntax clean and easy to understand, even the most complex of data science tasks can be done in a few lines of Python code. Efficient libraries for Data Science: Python has a vast array of libraries suited for various data science tasks, from scraping data to visualizing and manipulating it. NumPy, SciPy, pandas, matplotlib, Seaborn are some of the libraries worth mentioning here. Python has terrific libraries for machine learning: Learning a framework or a library which makes machine learning easier to perform is very important. Python has libraries such as scikit-learn and Tensorflow that makes machine learning easier and a fun-to-do activity. To make the most of these libraries, it is important to understand the fundamentals of Python. My colleague and good friend Aaron has put out a list of top 7 Python programming books which helped as a brilliant starting point to understand the different resources out there to learn Python. The one book that stood out for me was Learn Python Programming - Second Edition - This is a very good book to start Python programming from scratch. There is also a neat skill-map present on Mapt, where you can progressively build up your knowledge of Python - right from the absolute basics to the most complex concepts. Another handy resource to learn the A-Z of Python is Complete Python Masterclass. This is a slightly long course, but it will take you from the absolute fundamentals to the most advanced aspects of Python programming. Task Status: Ongoing Learn the fundamentals of data manipulation After learning the fundamentals of Python programming, the plan is to head straight to the Python-based libraries for data manipulation, analysis and visualization. Some of the major ones are what we already discussed above, and the plan to learn them is in the following order: NumPy - Used primarily for numerical computing pandas - One of the most popular Python packages for data manipulation and analysis matplotlib - The go-to Python library for data visualization, rivaling the likes of R’s ggplot2 Seaborn - A data visualization library that runs on top of matplotlib used for creating visually appealing charts, plots and histograms Some very good resources to learn about all these libraries: Python Data Analysis Python for Data Science and Machine Learning - This is a very good course with a detailed coverage on the machine learning concepts. Something to learn later. The aim is to learn these libraries upto a fairly intermediate level, and be able to manipulate, analyze and visualize any kind of data, including missing, unstructured data and time-series data. Understand the fundamentals of statistics, linear algebra and probability In order to take a step further and enter into the foray of machine learning, the general consensus is to first understand the maths and statistics behind the concepts of machine learning. Implementing them in Python is relatively easier once you get the math right, and that is what I plan to do. I shortlisted some very good resources for this as well: Statistics for Machine Learning Stanford University - Machine Learning Course at Coursera Task Status: Ongoing Learn Machine Learning (Sounds odd I know) After understanding the math behind machine learning, the next step is to learn how to perform predictive modeling using popular machine learning algorithms such as linear regression, logistic regression, clustering, and more. Using real-world datasets, the plan is to learn the art of building state-of-the-art machine learning models using Python’s very own scikit-learn library, as well as the popular Tensorflow package. To learn how to do this, the courses I mentioned above should come in handy: Stanford University - Machine Learning Course at Coursera Python for Data Science and Machine Learning Python Machine Learning, Second Edition Task Status: To be started [box type="shadow" align="" class="" width=""]During the course of this journey, websites like Stack Overflow and Stack Exchange will be my best friends, along with the popular resources such as YouTube.[/box] As I start this journey, I plan to share my experiences and knowledge with you all. Do you think the learning path looks good? Is there anything else that I should include in my learning path? I would really love to hear your comments, suggestions and experiences. Stay tuned for the next post where I seek answers to questions such as ‘How much of Python should I learn in order to be comfortable with Data Science?’, ‘How much time should I devote per day or week to learn the concepts in Data Science?’ and much more.. Read more Why is data science important? 9 Data Science Myths Debunked 30 common data science terms explained

0
0
25655

article-image-heres-how-you-can-handle-the-bias-variance-trade-off-in-your-ml-models

Savia Lobo

22 Jan 2018

8 min read

Here's how you can handle the bias variance trade-off in your ML models

Savia Lobo

22 Jan 2018

8 min read

Many organizations rely on machine learning techniques in their day-today workflow, to cut down on the time required to do a job. The reason why these techniques are robust is because they undergo various tests in order to carry out correct predictions about any data fed into them. During this phase, there are also certain errors generated, which can lead to an inconsistent ML model. Two common errors that we are going to look at in this article are that of bias and Variance, and how a trade-off can be achieved between the two in order to generate a successful ML model. Let’s first have a look at what creates these kind of errors. Machine learning techniques or more precisely supervised learning techniques involve training, often the most important stage in the ML workflow. The machine learning model is trained using the training data. How is this training data prepared? This is done by using a dataset for which the output of the algorithm is known. During the training stage, the algorithm analyzes the training data that is fed and produces patterns which are captured within an inferred function. This inferred function, which is derived after analysis of the training dataset, is the model that would be further used to map new examples. An ideal model generated from this training data should be able to generalize well. This means, it should learn from the training data and should correctly predict or classify data within any new problem instance. In general, the more complex the model is, the better it classifies the training data. However, if the model is too complex i.e it will pick up random features i.e. noise in the training data, this is the case of overfitting i.e. the model is said to overfit . On the other hand, if the model is not so complex, or missing out on important dynamics present within the data, then it is a case of underfitting. Both overfitting and underfitting are basically errors in the ML models or algorithms. Also, it is generally impossible to minimize both these errors at the same time and this leads to a condition called as the Bias-Variance Tradeoff. Before getting into knowing how to achieve the trade-off, lets simply understand how bias and variance errors occur. The Bias and Variance Error Let’s understand each error with the help of an example. Suppose you have 3 training datasets say T1, T2, and T3, and you pass these datasets through a supervised learning algorithm. The algorithm generates three different models say M1, M2, and M3 from each of the training dataset. Now let’s say you have a new input A. The whole idea is to apply each model on this new input A. Here, there can be two types of errors that can occur. If the output generated by each model on the input A is different(B1, B2, B3), the algorithm is said to have a high Variance Error. On the other hand, if the output from all the three models is same (B) but incorrect, the algorithm is said to have a high Bias Error. High Variance also means that the algorithm produces a model that is too specific to the training data, which is a typical case of Overfitting. On the other hand, high bias means that the algorithm has not picked up defining patterns from the dataset, this is a case of Underfitting. Some examples of high-bias ML algorithms are: Linear Regression, Linear Discriminant Analysis and Logistic Regression Examples of high-variance Ml algorithms are: Decision Trees, k-Nearest Neighbors and Support Vector Machines. How to achieve a Bias-Variance Trade-off? For any supervised algorithm, having a high bias error usually means it has low variance error and vise versa. To be more specific, parametric or linear ML algorithms often have a high bias but low variance. On the other hand, non-parametric or non-linear algorithms have vice versa. The goal of any ML model is to obtain a low variance and a low bias state, which is often a task due to the parametrization of machine learning algorithms. So how can we achieve a trade-off between the two? Following are some ways to achieve the Bias-Variance Tradeoff: By minimizing the total error: The optimum location for any model is the level of complexity at which the increase in bias is equivalent to the reduction in variance. Practically, there is no analytical method to find the optimal level. One should use an accurate measure for error prediction and explore different levels of model complexity, and then choose the complexity level that reduces the overall error. Generally resampling based measures such as cross-validation should be preferred over theoretical measures such as Aikake's Information Criteria. Source: http://scott.fortmann-roe.com/docs/BiasVariance.html (The irreducible error is the noise that cannot be reduced by algorithms but can be reduced with better data cleaning.) Using Bagging and Resampling techniques: These can be used to reduce the variance in model predictions. In bagging (Bootstrap Aggregating), several replicas of the original dataset are created using random selection with replacement. One modeling algorithm that makes use of bagging is Random Forests. In Random Forest algorithm, the bias of the full model is equivalent to the bias of a single decision tree--which itself has high variance. By creating many of these trees, in effect a "forest", and then averaging them the variance of the final model can be greatly reduced over that of a single tree. Adjusting minor values in algorithms: Both the k-nearest algorithms and Support Vector Machines(SVM) algorithms have low bias and high variance. But the trade-offs in both these cases can be changed. In the K-nearest algorithm, the value of k can be increased, which would simultaneously increase the number of neighbors that contribute to the prediction. This in turn would increase the bias of the model. Whereas, in the SVM algorithm, the trade-off can be changed by an increase in the C parameter that would influence the violations of the margin allowed in the training data. This will increase the bias but decrease the variance. Using a proper Machine learning workflow: This means you have to ensure proper training by: Maintaining separate training and test sets - Splitting the dataset into training (50%), testing(25%), and validation sets ( 25%). The training set is to build the model, test set is to check the accuracy of the model, and the validation set is to evaluate the performance of your model hyperparameters. Optimizing your model by using systematic cross-validation - A cross-validation technique is a must to fine tune the model parameters, especially for unknown instances. In supervised machine learning, validation or cross-validation is used to find out the predictive accuracy within various models of varying complexity, in order to find the best model.For instance, one can use the k-fold cross validation method. Here, the dataset is divided into k folds. For each fold, train the algorithm on k-1 folds iteratively, using the remaining fold(also called as 'holdout fold')as the test set. Repeat this process until each k has acted as a test set. The average of the k recorded errors is called as the cross validation error and can serve as the performance metric for the model. Trying out appropriate algorithms - Before relying on any model we need to first ensure that the model works best for our assumptions. One can make use of the No Free Lunch theorem, which states that one model can not work for only one problem. For instance, while using No Free lunch theorem, a random search will do the same as any of the heuristic optimization algorithms. Tuning the hyperparameters that can give an impactful performance - Any machine learning model requires different hyperparameters such as constraints, weights or learning rates for generalizing different data patterns. Tuning these hyperparameters is necessary so that the model can optimally solve machine learning problems. Grid search and randomized search are two such methods practiced for hyperparameter tuning. So, we have listed some of the ways where you can achieve trade-off between the two. Both bias and variance are related to each other, if you increase one the other decreases and vice versa. By a trade-off, there is an optimal balance in the bias and variance which gives us a model that is neither underfit nor overfit. And finally, the ultimate goal of any supervised machine algorithm lies in isolating the signal from the dataset, and making sure that it eliminates the noise.

0
0
25634

article-image-how-do-data-structures-and-data-models-differ

Amey Varangaonkar

21 Dec 2017

7 min read

How do Data Structures and Data Models differ?

Amey Varangaonkar

21 Dec 2017

7 min read

0
0
25625

article-image-polyglot-persistence-what-is-it-and-why-does-it-matter

Richard Gall

21 Jul 2018

3 min read

Polyglot persistence: what is it and why does it matter?

Richard Gall

21 Jul 2018

3 min read

Polyglot persistence is a way of storing data. It's an approach that acknowledges that often there is no one size fits all solution to data storage. From the types of data you're trying to store to your application architecture, polyglot persistence is a hybrid solution to data management. Think of polyglot programming. If polyglot programming is about using a variety of languages according to the context in which your working, polyglot persistence is applying that principle to database architecture. For example, storing transactional data in Hadoop files is possible, but makes little sense. On the other hand, processing petabytes of Internet logs using a Relational Database Management System (RDBMS) would also be ill-advised. These tools were designed to tackle specific types of tasks; even though they can be co-opted to solve other problems, the cost of adapting the tools to do so would be enormous. It is a virtual equivalent of trying to fit a square peg in a round hole. Polyglot persistence: an example For example, consider a company that sells musical instruments and accessories online (and in a network of shops). At a high-level, there are a number of problems that a company needs to solve to be successful: Attract customers to its stores (both virtual and physical). Present them with relevant products (you would not try to sell a drum kit to a pianist, would you?!). Once they decide to buy, process the payment and organize shipping. To solve these problems a company might choose from a number of available technologies that were designed to solve these problems: Store all the products in a document-based database such as MongoDB, Cassandra, DynamoDB, or DocumentDB. There are multiple advantages of document databases: flexible schema, sharding (breaking bigger databases into a set of smaller, more manageable ones), high availability, and replication, among others. Model the recommendations using a graph-based database (such as Neo4j, Tinkerpop/Gremlin, or GraphFrames for Spark): such databases reflect the factual and abstract relationships between customers and their preferences. Mining such a graph is invaluable and can produce a more tailored offering for a customer. For searching, a company might use a search-tailored solution such as Apache Solr or ElasticSearch. Such a solution provides fast, indexed text searching capabilities. Once a product is sold, the transaction normally has a well-structured schema (such as product name, price, and so on.) To store such data (and later process and report on it) relational databases are best suited. With polyglot persistence, a company always chooses the right tool for the right job instead of trying to coerce a single technology into solving all of its problems. Read next: How to optimize Hbase for the Cloud [Tutorial] The trouble with Smart Contracts Indexing, Replicating, and Sharding in MongoDB [Tutorial]

0
0
25589

article-image-5-cool-ways-transfer-learning-used-today

Savia Lobo

15 Nov 2017

7 min read

5 cool ways Transfer Learning is being used today

Savia Lobo

15 Nov 2017

7 min read

Machine learning has gained a lot of traction over the years because of the predictive solutions that it provides, including the development of intelligent, and reliable models. However, training the models is a laborious task because it takes time to curate the labeled data within the model and then to get the model ready. Reducing the time involved in training and labeling can be overcome by using the novel approach of Transfer Learning - a smarter and effective form of machine learning, where you can use the learnings of one scenario and apply that learning to a different but related problem. How exactly does Transfer Learning work? Transfer learning reduces the efforts to build a model from scratch by using the fundamental logic or base algorithms within one domain and applying it to another. For instance, in the real-world, the balancing logic learned while riding a bicycle can be transferred to learn driving other two-wheeled vehicles. Similarly, in the case of machine learning, transfer learning can be used to transfer the algorithmic logic from one ML model to the other. Let’s look into some of the possible use cases of transfer learning. [dropcap]1[/dropcap] Real-world Simulations Digital simulation is better than creating a physical prototype for real-world implementations. Training a robot in the real-world surroundings is both time and cost consuming. In order to minimize this, robots can now be trained using simulation and the knowledge acquired can be thus transferred onto a real-world robot. This is done using progressive networks, which are ideal for a simulation to the real world transfer of policies in robot control domains. These networks consist of essential features for learning numerous tasks in sequence while enabling transfer and are resistant to catastrophic forgetting--a tendency of Artificial Neural Networks(ANNs) to completely forget previously learned information, on learning a new information. Another application of simulation can be seen while training self-driving cars, which are trained using simulations through video games. Udacity has open sourced its self-driving car simulator which allows training self-driving cars through GTA 5 and many other video games. However, not all features of a simulation are replicated successfully when they are brought into the real world, as the interactions in the real world are more complex. [dropcap]2[/dropcap] Gaming The adoption of Artificial Intelligence has taken gaming to an altogether new level. DeepMind’s neural network program AlphaGo is a testament to this, as it successfully defeated a professional Go player. AlphaGo is a master in Go but fails when tasked to play other games. This is because its algorithm is tailored to play Go. So, the disadvantage of using ANNs in gaming is that they cannot master all games as a human brain does. In order to do this, AlphaGo has to totally forget Go and adapt itself to the new algorithms and techniques of the new game. With transfer Learning, the tactics learned in a game can be reapplied to play another game. An example of how Transfer learning is implemented in gaming can be seen in MadRTS, a commercial Real Time Strategy games. MadRTS, is developed to carry out military simulations. MadRTS uses CARL(CAse-based Reinforcement Learner), a multi-tiered architecture which combines Case-based reasoning(CBR) and Reinforcement Learning(RL). CBR provides an approach to tackle unseen but related problems based on past experiences within each level of the game. RL algorithms, on the other hand, allow the model to carry out good approximations to a situation, based on the agent’s experience in its environment--also known as Markov’s Decision Process. These CBR/RL transfer learning agents are evaluated in order to perform effective learning on tasks given in MadRTS, and should be able to learn better across tasks by transferring experience. [dropcap]3[/dropcap] Image Classification Neural networks are experts in recognizing objects within an image as they are trained on huge datasets of labeled images, which is time-consuming. How transfer learning helps here is, it reduces the time to train the model by pre-training the model using ImageNet, which contains millions of images from different categories. Let’s assume that a convolutional neural network - for instance, a VGG-16 ConvNet - has to be trained to recognize images within a dataset. Firstly, it is pre-trained using ImageNet. Then, it is trained layer-wise starting by replacing the final layer with a softmax layer and training it until the training saturates. Further, the other dense layers are trained progressively. By the end of the training, the ConvNet model is successful in learning to detect images from the dataset provided. In cases where the dataset is not similar to the pre-trained model data, one can finetune weights in the higher layers of the ConvNet by backpropagation methods. The dense layers contain the logic for detecting the image, thus, tuning the higher layers won’t affect the base logic. The convolutional neural networks can be trained on Keras, using Tensorflow or as a backend. An example of Image Classification can be seen in the field of medical imaging, where the convolutional model is trained on ImageNet to solve kidney detection problem in ultrasound images. [dropcap]4[/dropcap] Zero Shot translation Zero shot translation is an extended part of supervised learning, where the goal of the model is, learning to predict novel values from values that are not present in the training dataset. The prominent working example of zero shot translation can be seen in Google’s Neural Translation model(GNMT), which allows for effective cross-lingual translations. Prior to Zero shot implementation, two discrete languages had to be translated using a pivot language. For instance, to translate Korean to Japanese, Korean had to be first translated into English and then English to Japanese. Here, English is the pivot language that acts as a medium to translate Korean to Japanese. This resulted in a translated language that was full of distortions created by the first language pair. Zero shot translation rips off the need for a pivot language. It uses available training data to learn the translational knowledge applied, to translate a new language pair. Another instance of Zero shot translation can be seen in Image2Emoji, which combines visuals and texts to predict unseen emoji icons in a zero shot approach. [dropcap]5[/dropcap] Sentiment Classification Businesses can know their customers better by implementing Sentiment Analysis, which helps them to understand emotions and polarity (negative or positive) underlying the feedback and the product reviews. Analyzing sentiments for a new text corpus is difficult to build up, as training the models to detect different emotions is difficult. A solution to this is Transfer Learning. This involves training the models on any one domain, twitter feeds for instance, and fine-tuning them to another domain you wish to perform Sentiment Analysis on; say movie reviews. Here, deep learning models are trained on twitter feeds by carrying out sentiment analysis of the text corpus and also detecting the polarity of each statement. Once the model is trained on understanding emotions through polarity of the twitter feeds, its underlying language model and learned representation is transferred onto the model assigned a task to analyze sentiments within movie reviews. Here, an RNN model is trained on logistic regression techniques carried out sentiment analysis on the twitter feeds. The word embeddings and the recurrent weights learned from the source domain (twitter feeds) are re-used in the target domain (movie reviews) to classify sentiments within the latter domain. Conclusion Transfer learning has brought in a new wave of learning in machines by reusing algorithms and the applied logic, thus speeding up their learning process. This directly results in a reduction in the capital investment and also the time invested to train a model. This is why many organizations are looking forward to replicating such a learning onto their machine learning models. Also, transfer learning has been carried out successfully in the field of Image processing, Simulations, Gaming, and so on. How transfer learning affects the learning curve of machines in other sectors in the future, is worth watching out for.

0
0
25556

article-image-what-is-automated-machine-learning

Wilson D'souza

17 Oct 2017

6 min read

What is Automated Machine Learning (AutoML)?

Wilson D'souza

17 Oct 2017

6 min read

Are you a proud machine learning engineer who hates that the job tests your limits as a human being? Do you dread the long hours of data experimentation and data modeling that leave you high and dry? Automated Machine Learning or AutoML can put that smile back on your face. A self-replicating AI algorithm, AutoML is the latest tool that is being applied in the real world today, and AI market leaders such as Google have made a significant investment to research further in this field. AutoML has seen a steep rise in research and new tools over the last couple of years, but its recent mention during Google IO 2017 has piqued the interest of the entire developer community. What is AutoML all about and what makes it so interesting? Evolution of automated machine learning Before we try to understand AutoML, let’s look at what triggered the need for automated machine learning. Until now, building machine learning models that work in the real world has been a domain ruled by researchers, scientists, and machine learning experts. The process of manually designing a machine learning model involves several complex and time-consuming steps such as: Pre-processing data Selecting appropriate ML architecture Optimizing hyperparameters Constructing models Evaluating suitability of models Add to this, the several layers of neural networks required for an efficient ML architecture -- an n-layer neural network could result in nn potential networks. This level of complexity could be overwhelming for the millions of developers who are keen on embracing machine learning. AutoML tries to solve this problem of complexity and makes machine learning accessible to a large group of developers by automating routine but complex tasks such as the design of neural networks. Since this cuts down development time significantly and takes care of several complex tasks involved in building machine learning models, AutoML is expected to play a crucial role in bringing machine learning to the mainstream. Approaches to automating model generation With a growing body of research, AutoML aims to automate the following tasks in the field of machine learning: Model Selection Parameter Tuning Meta Learning Ensemble Construction It does this by using a wide range of algorithms and approaches such as: Bayesian Optimization: One of the fundamental approaches for automating model generation is to use Bayesian methods for hyperparameter tuning. By modeling the uncertainty of parameter performance, different variations of the model can be explored which offers an optimal solution. Meta-learning and Ensemble Construction: To further increase AutoML efficiency, meta-learning techniques are used to find and pick optimal hyperparameter settings. These techniques can be further coupled with auto-ensemble construction techniques to create effective ensemble model from a collection of models that undergo optimization. Using these techniques, a high level of accuracy can be achieved throughout the process of automated generation of models. Genetic Programming: Certain tools like TPOT also make use of a variation of genetic programming (tree-based pipeline optimization) to automatically design and optimize ML models that offer highly accurate results for a given set of data. This approach makes use of operators at various stages of the data pipeline which are assembled together in the form of a tree-based pipeline. These are then further optimized and newer pipelines are auto-generated using genetic programming. If these weren’t enough, Google in its recent posts disclosed that they are using reinforcement learning approach to give a further push to develop efficient AutoML techniques. What are some tools in this area? Although it’s still early days, we can already see some frameworks emerging to automate the generation of your machine learning models. Auto-sklearn: Auto-sklearn, the tool which won the ChaLearn AutoML Challenge, provides a wrapper around the popular Python library scikit-learn to automate machine learning. This is a great addition to the ever-growing ecosystem of Python data science tools. Built on top of Bayesian optimization, it takes away the hassle of algorithm selection, parameter tuning, and ensemble construction while building machine learning pipelines. With auto-sklearn, developers can create rapid iterations and refinements to their machine learning models, thereby saving a significant amount of development time. The tool is still in its early stages of development, so expect a few hiccups while using it. DataRobot: DataRobot offers a machine learning automation platform to all levels of data scientists aimed at significantly reducing the time to build and deploy predictive models. Since it’s a cloud platform it offers great power and speed throughout the process of automating the model generation process. In addition to automating the development of predictive models, it offers other useful features such as a web-based interface, compatibility with several leading tools such as Hadoop and Spark, scalability, and rapid deployment. It’s one of those few machine learning automation platforms which are ready for industry use. TPOT: TPOT is yet another Python tool meant for automated machine learning. It uses a genetic programming approach to iterate and optimize machine learning models. As in the case of auto-sklearn, TPOT is also built on top of scikit-learn. It has a growing interest level on GitHub with 2400 stars and has observed a 100% rise in the past one year alone. Its goals, however, are quite similar to those of Auto-sklearn: feature construction, feature selection, model selection, and parameter optimization. With these goals in mind, TPOT aims at building efficient machine learning systems in lesser time and with better accuracy. Will automated machine learning replace developers? AutoML as a concept is still in its infancy. But as market leaders like Google, Facebook, and others research more in this field, AutoML will keep evolving at a brisk pace. Assuming that AutoML would replace humans in the field of data science, however, is a far-fetched thought and nowhere near reality. Here is why. AutoML as a technique is meant to make the neural network design process efficient rather than replace humans and researchers in the field of building neural networks. The primary goal of AutoML is to help experienced data scientists be more efficient at their work i.e., enhance productivity by a huge margin and to reduce the steep learning curve for the many developers who are keen on designing ML models - i.e., make ML more accessible. With the advancements in this field, it’s exciting times for developers to embrace machine learning and start building intelligent applications. We see automated machine learning as a game changer with the power to truly democratize the building of AI apps. With automated machine learning, you don’t have to be a data scientist to develop an elegant AI app!

0
0
25468

Tech Guides - Data

Hyperledger: The Enterprise-ready Blockchain

4 myths about Git and GitHub you should know about

6 reasons why Google open-sourced TensorFlow

How is Artificial Intelligence changing the mobile developer role?

How are Mobile apps transforming the healthcare industry?

5 reasons to learn Generative Adversarial Networks (GANs) in 2018

The rise of machine learning in the investment industry

How Blockchain can level up IoT Security

Why an algorithm will never win a Pulitzer

Data science for non-techies: How I got started (Part 1)

Trending Topics

Here's how you can handle the bias variance trade-off in your ML models

How do Data Structures and Data Models differ?

Polyglot persistence: what is it and why does it matter?

5 cool ways Transfer Learning is being used today

What is Automated Machine Learning (AutoML)?

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access