Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

Tech Guides - Data

281 Articles
article-image-hyperledger-blockchain
Savia Lobo
26 Oct 2017
6 min read
Save for later

Hyperledger: The Enterprise-ready Blockchain

Savia Lobo
26 Oct 2017
6 min read
As one of the most widely discussed phenomena across the global media, Blockchain has certainly grown from just a hype to becoming a mainstream reality. Leading industry experts from finance, supply chain, and IoT are collaborating to make Blockchain available for commercial adoption. But while Blockchain is being projected as the future of digital transactions, it still suffers from two major limitations: carrying out private transactions and scalability. As such, a pressing need to develop a Blockchain-based distributed ledger to overcome these problems was widely felt. Enter Hyperledger Founded by Linux in 2015, Hyperledger aims at providing enterprises a platform to build robust blockchain applications for their businesses and to create open-source enterprise-grade frameworks to carry out secure business transactions. It is a fulcrum, which includes leading industries and software developers working collaboratively for building blockchain frameworks that can further be used to deploy blockchain applications for industries. With leading industry experts such as IBM, Intel, Accenture, SAP, among others collaborating with the Hyperledger community, and with the recent addition of BTS, Oracle, and Patientory Foundation, the community is gaining a lot of traction. No wonder, Brian Behlendorf, Executive Director at Hyperledger, says, “Growth and interest in Hyperledger remain high in 2017”. There are a total of 8 projects: five are frameworks (Sawtooth, Fabric, Burrow, Iroha, and Indy), and the other three are tools (Composer, Cello, and Explorer) supporting those frameworks. Each framework provides a different approach in building desired blockchain applications. Hyperledger Fabric, the community’s first framework, is contributed by IBM. It hosts smart contracts using Chaincode, an interface written in Go or Java, which contains the business logic of the ledger. Hyperledger Sawtooth, developed by Intel offers a modular blockchain architecture. It consists of Proof of Elapsed Time (PoET), a consensus algorithm developed by Intel for high efficiency among distributed ledgers. Hyperledger Burrow, a joint proposal by Intel and Monax, is a permissioned smart contract machine. It executes the smart contract code following the Ethereum specification with an engine, a strong audit trail, and a consensus mechanism. Apart from these already launched frameworks, two more - namely Indy and Iroha, are still in the incubation phase. The Hyperledger community is also building supporting tools such as  Composer which is already launched in the market and Cello and Explorer which are awaiting unveiling. [box type="shadow" align="" class="" width=""]Although a plethora of Hyperledger tools and frameworks are available, in the rest of the article we take Hyperledger Fabric - one of the most popular and trending frameworks - for the purpose of demonstrating how Hyperledger is being used by businesses.[/box] Why should businesses use Hyperledger? In order to lock down a framework upon which Blockchain apps can be built, several key aspects are worth considering. Some of the most important ones among them are portability, security, reliability, interoperability, and user-friendliness. Hyperledger as a platform offers all of the above features for building cross-platform and production-ready applications for businesses. Let’s take a simple example here to see how Hyperledger works for businesses. Consider a restaurant business. A restaurant owner buys vegetables from a wholesale shop at a much lower cost than in the market. The shopkeeper creates a network wherein other buyers cannot see the cost at which vegetables are sold to a buyer. Similarly, the restaurant owner can view only his transaction with the shopkeeper. For the vegetables to reach the restaurant, they must pass through numerous stages such as transport, delivery, and so on. The restaurant owner can track the delivery of his vegetables at each stage and so can the shopkeeper. The transport and the delivery organizations, however, won’t be able to see the transaction details. This means that the shopkeeper can establish a confidential network within a private network of other stakeholders. This type of a network can be set up using Hyperledger Fabric. Let’s break down the above example into some of the reasons to consider incorporating Hyperledger for your business networks: With Hyperledger you get performance, scalability, and multiple levels of trust. You get data on a need-to-know basis - Only the parties in the network that need the data get to know about it. Backed by bigshots like Intel and IBM, Hyperledger strives to offer a strong standard for Blockchain code which in turn provides better functionality at increased speeds. Furthermore, with the recent release of Fabric v1.0, businesses can create out-of-the-box blockchain solutions on its highly elastic and extensible architecture further eased by using Hyperledger Composer. The Composer aids businesses in creating smart contracts and blockchain applications without having to know the underlying complex intricacies of the blockchain network. It is a great fit for real-world enterprise usage, built with collaborative efforts from leading industry experts. Although Ethereum is used by many businesses, some of the reasons why Hyperledger could be a better enterprise fit are: While Ethereum is a public Blockchain, Hyperledger is a private blockchain. This means enterprises within the network know who is present on the peer nodes, unlike Ethereum. Hyperledger is a permissioned network i.e., it has the ability to grant permission on who can participate in the consensus mechanism of the Blockchain network. Ethereum, on the other hand, is permissionless. Hyperledger has no built-in cryptocurrency. Ethereum, on the other hand, has a built-in cryptocurrency, called Ether. Many applications don’t need a cryptocurrency to function, and using Ethereum there can be a disadvantage. Hyperledger gives you the flexibility of choosing a programming language such as Java or Go, for preparing smart contracts. Ethereum, on the other hand, uses Solidity which is a lot less common in use. Hyperledger is highly scalable — unlike traditional Blockchain and Ethereum — with minimal performance losses. “Since Hyperledger Fabric was designed to meet key requirements for permissioned blockchains with transaction privacy and configurable policies, we’ve been able to build solutions quickly and flexibly. ” - Mohan Venkataraman, CTO, IT People Corporation. Future of Hyperledger The Hyperledger community is expanding rapidly with many industries collaborating and offering their capabilities in building cross-industry blockchain applications. Hyperledger has found adoption within business networks in varied industries such as healthcare, finance, and supply chain to build state-of-the-art blockchain applications which assure privacy and decentralized permissioned networks. It is shaping up to be a technology which can revolutionize the way businesses deal with different access control within a consortium, with an armor of enhanced security measures. With the continuous developments in these frameworks, smarter, faster, and more secure business transactions will soon be a reality. Besides, we can expect to see Hyperledger on the cloud with IBM’s plans to extend Blockchain technologies onto its cloud. Add to that the exciting prospect of blending aspects of Artificial Intelligence with Hyperledger, transactions look more advanced, tamper-proof, and secure than ever before.
Read more
  • 0
  • 0
  • 26817

article-image-4-myths-about-git-and-github-you-should-know-about
Savia Lobo
07 Oct 2018
3 min read
Save for later

4 myths about Git and GitHub you should know about

Savia Lobo
07 Oct 2018
3 min read
With an aim to replace BitKeeper, Linus Torvalds created Git in 2005 to support the development of the Linux kernel. However, Git isn’t necessarily limited to code, any product or project that requires or exhibits characteristics such as having multiple contributors, requiring release management and versioning stands to have an improved workflow through Git. Just as every solution or tool has its own positives and negatives, Git is also surrounded by myths. Alex Magana and Joseph Mul, the authors of Introduction to Git and GitHub course discuss in this post some of the myths about the Git tool and GitHub. Git is GitHub Due to the usage of Git and GitHub as the complete set that forms the version control toolkit, adopters of the two tools misconceive Git and GitHub as interchangeable tools. Git is a tool that offers the ability to track changes on files that constitute a project. Git offers the utility that is used to monitor changes and persists the changes. On the other hand, GitHub is akin to a website hosting service. The difference here is that with GitHub, the hosted content is a repository. The repository can then be accessed from this central point and the codebase shared. Backups are equivalent to version control This emanates from a misunderstanding of what version control is and by extension what Git achieves when it’s incorporated into the development workflow. Contrary to archives created based on a team’s backup policy, Git tracks changes made to files and maintains snapshots of a repository at a given point in time. Git is only suitable for teams With the usage of hosting services such as GitHub, the element of sharing and collaboration, may be perceived as a preserve of teams. Git offers gains beyond source control. It lends itself to the delivery of a feature or product from the point of development to deployment. This means that Git is a tool for delivery. It can, therefore, be utilized to roll out functionality and manage changes to source code for teams and individuals alike. To effectively use Git, you need to learn every command to work When working as an individual or a team, the common commands required to allow you to contribute a repository encompass commands for initiating tracking of specific files, persisting changes made to tracked files, reverting changes made to files incorporating changes introduced by other developers working on the same project you are on. The four myths mentioned by the authors provides a clarification on both Git and GitHub and its uses. If you found this post useful, do check out the course titled Introduction to Git and GitHub by Alex and Joseph. GitHub addresses technical debt, now runs on Rails 5.2.1 GitLab 11.3 released with support for Maven repositories, protected environments and more GitLab raises $100 million, Alphabet backs it to surpass Microsoft’s GitHub  
Read more
  • 0
  • 0
  • 26724

article-image-google-opensorced-tensorflow
Kunal Parikh
13 Sep 2017
7 min read
Save for later

6 reasons why Google open-sourced TensorFlow

Kunal Parikh
13 Sep 2017
7 min read
On November 9, 2015, a storm loomed over the SF Bay area creating major outages. At Mountain View, California, Google engineers were busy creating a storm of their own. That day, Sundar Pichai announced to the world that TensorFlow, their machine learning system, was going Open Source. He said: “...today we’re also open-sourcing TensorFlow. We hope this will let the machine learning community—everyone from academic researchers, to engineers, to hobbyists—exchange ideas much more quickly, through working code rather than just research papers.” That day the tech world may not have fully grasped the gravity of the announcement but those in the know knew it was a pivotal moment in Google’s transformational journey into an AI first world. How did TensorFlow begin? TensorFlow was part of a former Google product called DistBelief. DistBelief was responsible for a program called DeepDream. The program was built for scientists and engineers to visualise how deep neural networks process images. As fate would have it, the algorithm went viral and everyone started visualising abstract and psychedelic art in it. Although people were having fun playing with image forms, they were unaware of the technology that powered those images - neural networks and deep learning - the exact reason why TensorFlow was built for. TensorFlow is a machine learning platform that allows one to run a wide range of algorithms like the aforementioned neural networks and deep learning based projects. TensorFlow with its flexibility, high performance, portability, and production-readiness is changing the landscape of artificial intelligence and machine learning. Be it face recognition, music, and art creation or detecting clickbait headline for blogs, the use cases are immense. With Google open sourcing TensorFlow, the platform that powers Google search and other smart Google products is now accessible to everyone - researchers, scientists, machine learning experts, students, and others. So why did Google open source TensorFlow? Yes, Google made a world of difference to the machine learning community at large by open sourcing TensorFlow. But what was in it for Google? As it turns out, a whole lot. Let’s look at a few. Google is feeling the heat from rival deep learning frameworks Major deep learning frameworks like Theano, Keras, etc., were already open source. Keeping a framework proprietary was becoming a strategic disadvantage as most DL core users i.e. scientists, engineers, and academicians prefer using open source software for their work. “Pure” researchers and aspiring “Phds” are key groups that file major patents in the world of AI. By open sourcing TensorFlow, Google gave this community access to a platform it backs to power their research. This makes migrating the world’s algorithms from other deep learning tools onto TensorFlow theoretically possible. AI as a trend is clearly here to stay and Google wants a platform that leads this trend. An open source TensorFlow can better support the Google Brain project Behind all the PR, Google does not speak much about its pet project Google Brain. When Sundar Pichai talks of Google’s transformation from Search to AI, this project is doing all the work behind the scenes. Google Brain is headed by some of the best minds in the industry like Jeff Dean, Geoffery Hilton, Andrew NG, among many others. They developed TensorFlow and they might still have some state-of-the-art features up their sleeves privy only to them. After all, they have done a plethora of stunning research in areas like parallel computing, machine intelligence, natural language processing and many more. With TensorFlow now open sourced, this team can accelerate the development of the platform and also make significant inroads into areas they are currently researching on. This research can then potentially develop into future products for Google which will allow them to expand their AI and Cloud clout, especially in the enterprise market. Tapping into the collective wisdom of the academic intelligentsia Most innovations and breakthroughs come from universities before they go mainstream and become major products in enterprises. AI, still making this transition, will need a lot of investment in research. To work on difficult algorithms, researchers will need access to sophisticated ML frameworks. Selling TensorFlow to universities is an old school way to solve the problem - that’s why we no longer hear about products like LabView. Instead, by open-sourcing TensorFlow, the team at Google now has the world’s best minds working on difficult AI problems on their platform for free. As these researchers start writing papers on AI using TensorFlow, it will keep adding to the existing body of knowledge. They will have all the access to bleeding-edge algorithms that are not yet available in the market. Their engineers could simply pick and choose what they like and start developing commercially ready services. Google wants to develop TensorFlow as a platform-as-a-service for AI application development An advantage of open-sourcing a tool is that it accelerates time to build and test through collaborative app development. This means most of the basic infrastructure and modules to build a variety of TensorFlow based applications will already exist on the platform. TensorFlow developers can develop and ship interesting modular products by mixing and matching code and providing a further layer of customization or abstraction. What Amazon did for storage with AWS, Google can do for AI with TensorFlow. It won’t come as a surprise if Google came up with their own integrated AI ecosystem with TensorFlow on the Google Cloud promising you the AI resources your company would need. Suppose you want a voice based search function on your ecommerce mobile application. Instead, of completely reinventing the wheel, you could buy TensorFlow powered services provided by Google. With easy APIs, you can get voice based search and save substantial developer cost and time. Open sourcing TensorFlow will help Google to extend their talent pipeline in a competitive Silicon Valley jobs market Hiring for AI development is  competitive in the Silicon Valley as all major companies vie for attention from the same niche talent pool. With TensorFlow made freely available, Google’s HR team can quickly reach out to a talent pool specifically well versed with the technology and also save on training cost. Just look at the interest TensorFlow has generated on a forum like StackOverflow: This indicates that growing number of users are asking and inquiring about TensorFlow. Some of these users will migrate into power users who the Google HR team can tap into. A developer pool at this scale would never have been possible with a proprietary tool. Replicating the success and learning from Android Agreed, a direct comparison with Android is not possible. However, the size of the mobile market and Google’s strategic goal of mobile-first when they introduced Android bear striking similarity with the nascent AI ecosystem we have today and Google’s current AI-first rhetoric. In just a decade since its launch, Android now owns more than 85% of the smartphone mobile OS market. Piggybacking on Android’s success, Google now has control of mobile search (96.19%), services (Google Play), a strong connection with the mobile developer community and even a viable entry into the mobile hardware market. Open sourcing Android did not stop Google from making money. Google was able to monetize through other ways like mobile search, mobile advertisements, Google Play, devices like Nexus, mobile payments, etc. Google did not have all this infrastructure planned and ready before Android was open sourced - It innovated, improvised, and created along the way. In the future, we can expect Google to adopt key learnings from its Android growth story and apply to TensorFlow’s market expansion strategy. We can also see supporting infrastructures and models for commercialising TensorFlow emerge for enterprise developers. [dropcap]T[/dropcap]he road to AI world domination for Google is on the back of an open sourced TensorFlow platform. It appears not just exciting but also promises to be one full of exponential growth, crowdsourced innovation and learnings drawn from other highly successful Google products and services. The storm that started two years ago is surely morphing into a hurricane. As Professor Michael Guerzhoy of University of Toronto quotes in Business Insider “Ten years ago, it took me months to do something that for my students takes a few days with TensorFlow.”
Read more
  • 0
  • 0
  • 26622

article-image-how-is-artificial-intelligence-changing-the-mobile-developer-role
Bhagyashree R
15 Oct 2018
10 min read
Save for later

How is Artificial Intelligence changing the mobile developer role?

Bhagyashree R
15 Oct 2018
10 min read
Last year, at Google I/O, Sundar Pichai, the CEO of Google, said: “We are moving from a mobile-first world to an AI-first world” Is it only applicable to Google? Not really. In the recent past, we have seen several advancements in Artificial Intelligence and in parallel a plethora of intelligent apps coming into the market. These advancements are enabling developers to take their apps to the next level by integrating recommendation service, image recognition, speech recognition, voice translation, and many more cool capabilities. Artificial Intelligence is becoming a potent tool for mobile developers to experiment and innovate. The Artificial Intelligence components that are integral to mobile experiences, such as voice-based assistants and location-based services, increasingly require mobile developers to have a basic understanding of Artificial Intelligence to be effective. Of course, you don’t have to be Artificial Intelligence experts to include intelligent components in your app. But, you should definitely understand something about what you’re building into your app and why. After all AI in mobile is not just limited to calling an API, isn't it? There’s more to it and in this article we will explore how Artificial Intelligence will shape the mobile developer role in the immediate future. Read also: AI on mobile: How AI is taking over the mobile devices marketspace What is changing in the mobile developer role? Focus shifting to data With Artificial Intelligence becoming more and more accessible, intelligent apps are becoming the new norm for businesses. Artificial Intelligence strengthens the relationship between brands and customers, inspiring developers to build smart apps that increase user retention. This also means that developers have to direct their focus to data. They have to understand things like how the data will be collected? How will the data be fed to machines and how often will data input be needed? When nearly 1 in 4 people abandon an app after its first use, as a mobile app developer, you need to rethink how you drive in-app personalization and engagement. Explore “humanized” way of user-app interaction With so many chatbots such as Siri and Google Assistant coming into the market, we can see that “humanizing” the interaction between the user and the app is becoming mainstream. “Humanizing” is the process where the app becomes relatable to the user, and the more effective it is conducted, the more the end user will interact with the app. Users now want easy navigation and searching system and Artificial Intelligence fits perfectly in the scenario. The advances in technologies like text-to-speech, speech-to-text, Natural Language Processing, and cloud services, in general, have contributed to the mass adoption of these types of interfaces. Companies are increasingly expecting mobile developers to be comfortable working with AI functionalities Artificial Intelligence is the future. Companies are now expecting their mobile developers to know how to handle the huge amount of data generated every day and how to use it. Here's is an example of what Google wants their engineers to do: “We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day.” This open-ended requirement list shows that it is the right time to learn and embrace Artificial Intelligence as soon as possible. What skills do you need to build intelligent apps? Ideally, data scientists are the ones who conceptualize mathematical models and machine learning engineers are the ones who translate it into the code and train the model. But, when you are working in a resource-tight environment, for example in a start-up, you will be responsible for doing the end-to-end job. It is not as scary as it sounds, because you have several resources to get started with! Taking your first steps with machine learning as a service Learning anything starts with motivating yourself. Directly diving into the maths and coding part of machine learning might exhaust and bore you. That's why it's a good idea to know what the end goal of your entire learning process is going to be and what types of solutions are possible using machine learning. There are many products available that you can try to quickly get started such as Google Cloud AutoML (Beta), Firebase MLKit (Beta), and Fritz Mobile SDK, among others. Read also: Machine Learning as a Service (MLaaS): How Google Cloud Platform, Microsoft Azure, and AWS are democratizing Artificial Intelligence Getting your hands dirty After getting a “warm-up” the next step will involve creating and training your own model. This is where you’ll be introduced to TensorFlow Lite, which is going to be your best friend throughout your journey as a machine learning mobile developer. There are many other machine learning tools coming into the market that you can make use of. These tools make building AI in mobile easier. For instance, you can use Dialogflow, a Natural Language Understanding (NLU) platform that makes it easy for developers to design and integrate conversational user interfaces into mobile apps, web applications, devices, and bots. You can then integrate it on Alexa, Cortana, Facebook Messenger, and other platforms your users are on. Read also: 7 Artificial Intelligence tools mobile developers need to know For practicing you can leverage an amazing codelab by Google, TensorFlow For Poets. It guides you through creating and training a custom image classification model. Through this codelab you will learn the basics of data collection, model optimization, and other key components involved in creating your own model. The codelab is divided into two parts. The first part covers creating and training the model, and the second part is focused on TensorFlow Lite which is the mobile version of TensorFlow that allows you to run the same model on a mobile device. Mathematics is the foundation of machine learning Love it or hate it, machine learning and Artificial Intelligence are built on mathematical principles like calculus, linear algebra, probability, statistics, and optimization. You need to learn some essential foundational concepts and the notation used to express them. There are many reasons why learning mathematics for machine learning is important. It will help you in the process of selecting the right algorithm which includes giving considerations to accuracy, training time, model complexity, number of parameters and number of features. Maths is needed when choosing parameter settings and validation strategies, identifying underfitting and overfitting by understanding the bias-variance tradeoff. Read also: Bias-Variance tradeoff: How to choose between bias and variance for your machine learning model [Tutorial] Read also: What is Statistical Analysis and why does it matter? What are the key aspects of Artificial Intelligence for mobile to keep in mind? Understanding the problem Your number one priority should be the user problem you are trying to solve. Instead of randomly integrating a machine learning model into an application, developers should understand how the model applies to the particular application or use case. This is important because you might end up building a great machine learning model with excellent accuracy rate, but if it does not solve any problem, it will end up being redundant. You must also understand that while there are many business problems which require machine learning approaches, not all of them do. Most business problems can be solved through simple analytics or a baseline approach. Data is your best friend Machine learning is dependent on data; the data that you use, and how you use it, will define the success of your machine learning model. You can also make use of thousands of open source datasets available online. Google recently launched a tool for dataset search named, Google Dataset Search which will make it easier for you to search the right dataset for your problem. Typically, there’s no shortage of data; however, the abundant existence of data does not mean that the data is clean, reliable, or can be used as intended. Data cleanliness is a huge issue. For example, a typical company will have multiple customer records for a single individual, all of which differ slightly. If the data isn’t clean, it isn’t reliable. The bottom line is, it’s a bad practice to just grabbing the data and using it without considering its origin. Read also: Best Machine Learning Datasets for beginners Decide which model to choose A machine learning algorithm is trained and the artifact that it creates after the training process is called the machine learning model. An ML model is used to find patterns in data without the developer having to explicitly program those patterns. We cannot look through such a huge amount of data and understand the patterns. Think of the model as your helper who will look through all those terabytes of data and extract knowledge and insights from the data. You have two choices here: either you can create your own model or use a pre-built model. While there are several pre-built models available, your business-specific use cases may require specialized models to yield the desired results. These off-the-shelf model may also need some fine-tuning or modification to deliver the value the app is intended to provide. Read also: 10 machine learning algorithms every engineer needs to know Thinking about resource utilization is important Artificial Intelligence-powered apps or apps, in general, should be developed with resource utilization in mind. Though companies are working towards improving mobile hardware, currently, it is not the same as what we can accomplish with GPU clusters in the cloud. Therefore, developers need to consider how the models they intend to use would affect resources including battery power and memory usage. In terms of computational resources, inferencing or making predictions is less costly than training. Inferencing on the device means that the models need to be loaded into RAM, which also requires significant computational time on the GPU or CPU. In scenarios that involve continuous inferencing, such as audio and image data which can chew up bandwidth quickly, on-device inferencing is a good choice. Learning never stops Maintenance is important, and to do that you need to establish a feedback loop and have a process and culture of continuous evaluation and improvement. A change in consumer behavior or a market trend can make a negative impact on the model. Eventually, something will break or no longer work as intended, which is another reason why developers need to understand the basics of what it is they’re adding to an app. You need to have some knowledge of how the Artificial Intelligence component that you just put together is working or how it could be made to run faster. Wrapping up Before falling for the Artificial Intelligence and machine learning hype, it’s important to understand and analyze the problem you are trying to solve. You should examine whether applying machine learning can improve the quality of the service, and decide if this improvement justifies the effort of deploying a machine learning model. If you just want a simple API endpoint and don’t want to dedicate much time in deploying a model, cloud-based web services are the best option for you. Tools like ML Kit for Firebase looks promising and seems like a good choice for startups or developers just starting out. TensorFlow Lite and Core ML are good options if you have mobile developers on your team or if you’re willing to get your hands a little dirty. Artificial Intelligence is influencing the app development process by providing us a data-driven approach for solving user problems. It wouldn't be surprising if in the near future Artificial Intelligence becomes a forerunning factor for app developers in their expertise and creativity. 10 useful Google Cloud Artificial Intelligence services for your next machine learning project [Tutorial] How Artificial Intelligence is going to transform the Data Center How Serverless computing is making Artificial Intelligence development easier
Read more
  • 0
  • 0
  • 25873

article-image-how-are-mobile-apps-transforming-the-healthcare-industry
Guest Contributor
15 Jan 2019
5 min read
Save for later

How are Mobile apps transforming the healthcare industry?

Guest Contributor
15 Jan 2019
5 min read
Mobile App Development has taken over and completely re-written the healthcare industry. According to Healthcare Mobility Solutions reports, the Mobile healthcare application market is expected to be worth more than $84 million by the year 2020. These mobile applications are not just limited to use by patients but are also massively used by doctors and nurses. As technology evolves, it simultaneously opens up the possibility of being used in multiple ways. Similar has been the journey of healthcare mobile app development that has originated from the latest trends in technology and has made its way to being an industry in itself. The technological trends that have helped build mobile apps for the healthcare industry are Blockchain You probably know blockchain technology, thanks to all the cryptocurrency rage in recent years. The blockchain is basically a peer-to-peer database that keeps a verified record of all transactions, or any other information that one needs to track and have it accessible to a large community. The healthcare industry can use a technology that allows it to record the medical history of patients, and store it electronically, in an encrypted form, that cannot be altered or hacked into. Blockchain succeeds where a lot of health applications fail, in the secure retention of patient data. The Internet of Things The Internet of Things (IoT) is all about connectivity. It is a way of interconnecting electronic devices, software, applications, etc., to ensure easy access and management across platforms. The loT will assist medical professionals in gaining access to valuable patient information so that doctors can monitor the progress of their patients. This makes treatment of the patient easier, and more closely monitored, as doctors can access the patient’s current profile anywhere and suggest treatment, medicine, and dosages. Augmented Reality From the video gaming industry, Augmented Reality has made its way to the medical sector. AR refers to the creation of an interactive experience of a real-world environment through superimposition of computer-generated perceptual information. AR is increasingly used to develop mobile applications that can be used by doctors and surgeons as a training experience. It stimulates a real-world experience of diagnosis and surgery, and by doing so, enhances the knowledge and its practical application that all doctors must necessarily possess. This form of training is not limited in nature, and can, therefore, simultaneously train a large number of medical practitioners. Big Data Analytics Big Data has the potential to provide comprehensive statistical information, only accessed and processed through sophisticated software. Big Data Analytics becomes extremely useful when it comes to managing the hospital’s resources and records in an efficient manner. Aside from this, it is used in the development of mobile applications that store all patient data, thus again, eliminating the need for excessive paperwork. This allows medical professionals to focus more on attending and treating the patients, rather than managing database. These technological trends have led to the development of a diverse variety of mobile applications to be used for multiple purposes in the healthcare industry. Listed below are the benefits of the mobile apps deploying these technological trends, for the professionals and the patients alike. Telemedicine Mobile applications can potentially play a crucial role in making medical services available to the masses. An example is an on-call physician on telemedicine duty. A mobile application will allow the physician to be available for a patient consult without having to operate via  PC. This will make the doctors more accessible and will bring quality treatment to the patients quickly. Enhanced Patient Engagement There are mobile applications that place all patient data – from past medical history to performance metrics, patient feedback, changes in the treatment patterns and schedules, at the push of a button on the smartphone application for the medical professional to consider and make a decision on the go. Since all data is recorded in real-time, it makes it easy for doctors to change shifts without having to explain to the next doctor the condition of the patient in person. The mobile application has all the data the supervisors or nurses need. Easy Access to Medical Facilities There are a number of mobile applications that allow patients to search for medical professionals in their area, read their reviews and feedback by other patients, and then make an online appointment if they are satisfied with the information that they find. Apart from these, they can also download and store their medical lab reports, and order medicines online at affordable prices. Easy Payment of Bills Like in every other sector, mobile applications in healthcare have made monetary transactions extremely easy. Patients or their family members, no longer need to spend hours waiting in the line to pay the bills. They can instantly pick a payment plan and pay bills immediately or add reminders to be notified when a bill is due. Therefore, it can be safely said that the revolution that the healthcare industry is undergoing and has worked in the favor of all the parties involved – Medical Professionals, Patients, Hospital Management and the Mobile App Developers. Author's Bio Ritesh Patil is the co-founder of Mobisoft Infotech that helps startups and enterprises in mobile technology. He’s an avid blogger and writes on mobile application development. He has developed innovative mobile applications across various fields such as Finance, Insurance, Health, Entertainment, Productivity, Social Causes, Education and many more and has bagged numerous awards for the same. Social Media – Twitter, LinkedIn Healthcare Analytics: Logistic Regression to Reduce Patient Readmissions How IBM Watson is paving the road for Healthcare 3.0 7 Popular Applications of Artificial Intelligence in Healthcare
Read more
  • 0
  • 0
  • 25804

article-image-5-reasons-learn-generative-adversarial-networks-gans
Savia Lobo
12 Dec 2017
5 min read
Save for later

5 reasons to learn Generative Adversarial Networks (GANs) in 2018

Savia Lobo
12 Dec 2017
5 min read
Generative Adversarial Networks (GANs) are a prominent branch of Machine learning research today. As deep neural networks require a lot of data to train on, they perform poorly if data provided is not sufficient. GANs can overcome this problem by generating new and real data, without using the tricks like data augmentation. As the application of GANs in the Machine learning industry is still at the infancy level, it is considered a highly desirable niche skill. Having an added hands-on experience raises the bar higher in the job market. It can fetch you a higher pay over your colleagues and can also be the feature that sets your resume stand apart. Source: Gartner's Hype Cycle 2017  GANs along with CNNs and RNNs are a part of the in demand deep neural network experience in the industry. Here are five reasons why you should learn GANs today and how Kuntal Ganguly’s book, Learning Generative Adversarial Networks help you do just that. Kuntal is a big data analytics engineer at Amazon Web Services. He has around 7 years of experience building large-scale, data-driven systems using big data frameworks and machine learning. He has designed, developed, and deployed several large-scale distributed applications, without any assistance. Kuntal is a seasoned author with a rich set of books ranging across the data science spectrum from machine learning, deep learning, to Generative Adversarial Networks, published under his belt.[/author] The book shows how to implement GANs in your machine learning models in a quick and easy format with plenty of real-world examples and hands-on tutorials. 1. Unsupervised Learning now a cakewalk with GANs A major challenge of unsupervised learning is the massive amount of unlabelled data one needed to work through as part of data preparation. In traditional neural networks, this labeling of data is both costly and time-consuming. A creative aspect of Deep learning is now possible using Generative Adversarial Networks. Here, the neural networks are capable of generating realistic images from the real-world datasets (such as MNIST and CIFAR). GANs provide an easy way to train the DL algorithms. This is done by slashing down the amount of data required to train the neural network models, that too, with no labeling of data required. This book uses a semi-supervised approach to solve the problem of unsupervised learning for classifying images. However, this could be easily leveraged into developer’s own problem domain. 2. GANs help you change a horse into a zebra using Image style transfer https://www.youtube.com/watch?v=9reHvktowLY Turning an apple into an orange is Magic!! GANs can do this magic, without casting a  spell. Transferring Image-to-Image style, where the styling of one image is applied to the other. What GANs can do is, they can perform image-to-image translations across various domains (such as changing apple to orange or horse to zebra) using Cycle Consistent Generative Network (Cycle GANs). Detailed examples of how to turn the image of an apple to an orange using TensorFlow, and how of turn an image of a horse into a zebra using a GAN model, are given in this book.  3. GANs inputs your text and outputs an image Generative Adversarial networks can also be utilized for text-to-image synthesis. An example of this is in generating a photo-realistic image based on a caption. To do this, a dataset of images with their associated captions are given as training data. The dataset is first encoded using a hybrid neural network called the character-level convolutional Recurrent Neural network, which creates a joint representation of both in multimodal space for both the generator and the discriminator. In this book, Kuntal showcases the technique of stacking multiple generative networks to generate realistic images from textual information using StackGANs.Further, the book goes on to explain the coupling of two generative networks, to automatically discover relationships among various domains (a relationship between shoes and handbags or actor and actress) using DiscoGANs. 4. GANs + Transfer Learning = No more model generation from scratch Source: Learning Generative Adversarial Networks Data is the basis to train any Machine learning model, scarcity of which can lead to a poorly-trained model, which can have high chances of failure. Some real-life scenarios may not have sufficient data, hardware, or resources to train bigger networks in order to achieve the desired accuracy. So, is training from scratch a must-do for training the models? A well-known technique used in deep learning that adapts an existing trained model for a similar task to the task at hand is known as Transfer Learning. This book will showcase Transfer learning using some hands-on examples. Further, a combination of both Transfer learning and GANs, to generate high-resolution realistic images with facial datasets is explained. Thus, you will also understand how to create creating artistic hallucination on images beyond GAN. 5. GANs help you take Machine Learning models to Production Most Machine learning tutorials, video courses, and books, explain the training and the evaluation of the models. But how do we take this trained model to production and put it to use and make it available to customers? In this book, the author has taken an example, i.e. developing a facial correction system using an LFW dataset, to automatically correct corrupted images using your trained GAN model. This book also contains several techniques of deploying machine learning or deep learning models in production both on data centers and clouds with micro-service based containerized environments.You will also learn the way of running deep models in a serverless environment and with managed cloud services. This article just scratches the surface of what is possible with GANs and why learning it would change your thinking about deep neural networks. To know more grab your copy of Kuntal Ganguly’s book on GANs: Learning Generative Adversarial Networks.     .
Read more
  • 0
  • 0
  • 25761
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-the-rise-of-machine-learning-in-the-investment-industry
Natasha Mathur
15 Feb 2019
13 min read
Save for later

The rise of machine learning in the investment industry

Natasha Mathur
15 Feb 2019
13 min read
The investment industry has evolved dramatically over the last several decades and continues to do so amid increased competition, technological advances, and a challenging economic environment. In this article, we will review several key trends that have shaped the investment environment in general, and the context for algorithmic trading more specifically. This article is an excerpt taken from the book 'Hands on Machine Learning for algorithmic trading' written by Stefan Jansen. The book explores the strategic perspective, conceptual understanding, and practical tools to add value from applying ML to the trading and investment process. The trends that have propelled algorithmic trading and ML to current prominence include: Changes in the market microstructure, such as the spread of electronic trading and the integration of markets across asset classes and geographies The development of investment strategies framed in terms of risk-factor exposure, as opposed to asset classes The revolutions in computing power, data-generation and management, and analytic methods The outperformance of the pioneers in algorithmic traders relative to human, discretionary investors In addition, the financial crises of 2001 and 2008 have affected how investors approach diversification and risk management and have given rise to low-cost passive investment vehicles in the form of exchange-traded funds (ETFs). Amid low yield and low volatility after the 2008 crisis, cost-conscious investors shifted $2 trillion from actively-managed mutual funds into passively managed ETFs. Competitive pressure is also reflected in lower hedge fund fees that dropped from the traditional 2% annual management fee and 20% take of profits to an average of 1.48% and 17.4%, respectively, in 2017. Let's have a look at how ML has come to play a strategic role in algorithmic trading. Factor investing and smart beta funds The return provided by an asset is a function of the uncertainty or risk associated with financial investment. An equity investment implies, for example, assuming a company's business risk, and a bond investment implies assuming default risk. To the extent that specific risk characteristics predict returns, identifying and forecasting the behavior of these risk factors becomes a primary focus when designing an investment strategy. It yields valuable trading signals and is the key to superior active-management results. The industry's understanding of risk factors has evolved very substantially over time and has impacted how ML is used for algorithmic trading. Modern Portfolio Theory (MPT) introduced the distinction between idiosyncratic and systematic sources of risk for a given asset. Idiosyncratic risk can be eliminated through diversification, but systematic risk cannot. In the early 1960s, the Capital Asset Pricing Model (CAPM) identified a single factor driving all asset returns: the return on the market portfolio in excess of T-bills. The market portfolio consisted of all tradable securities, weighted by their market value. The systematic exposure of an asset to the market is measured by beta, which is the correlation between the returns of the asset and the market portfolio. The recognition that the risk of an asset does not depend on the asset in isolation, but rather how it moves relative to other assets, and the market as a whole, was a major conceptual breakthrough. In other words, assets do not earn a risk premium because of their specific, idiosyncratic characteristics, but because of their exposure to underlying factor risks. However, a large body of academic literature and long investing experience have disproved the CAPM prediction that asset risk premiums depend only on their exposure to a single factor measured by the asset's beta. Instead, numerous additional risk factors have since been discovered. A factor is a quantifiable signal, attribute, or any variable that has historically correlated with future stock returns and is expected to remain correlated in future. These risk factors were labeled anomalies since they contradicted the Efficient Market Hypothesis (EMH), which sustained that market equilibrium would always price securities according to the CAPM so that no other factors should have predictive power. The economic theory behind factors can be either rational, where factor risk premiums compensate for low returns during bad times, or behavioral, where agents fail to arbitrage away excess returns. Well-known anomalies include the value, size, and momentum effects that help predict returns while controlling for the CAPM market factor. The size effect rests on small firms systematically outperforming large firms, discovered by Banz (1981) and Reinganum (1981). The value effect (Basu 1982) states that firms with low valuation metrics outperform. It suggests that firms with low price multiples, such as the price-to-earnings or the price-to-book ratios, perform better than their more expensive peers (as suggested by the inventors of value investing, Benjamin Graham and David Dodd, and popularized by Warren Buffet). The momentum effect, discovered in the late 1980s by, among others, Clifford Asness, the founding partner of AQR, states that stocks with good momentum, in terms of recent 6-12 month returns, have higher returns going forward than poor momentum stocks with similar market risk. Researchers also found that value and momentum factors explain returns for stocks outside the US, as well as for other asset classes, such as bonds, currencies, and commodities, and additional risk factors. In fixed income, the value strategy is called riding the yield curve and is a form of the duration premium. In commodities, it is called the roll return, with a positive return for an upward-sloping futures curve and a negative return otherwise. In foreign exchange, the value strategy is called carry. There is also an illiquidity premium. Securities that are more illiquid trade at low prices and have high average excess returns, relative to their more liquid counterparts. Bonds with higher default risk tend to have higher returns on average, reflecting a credit risk premium. Since investors are willing to pay for insurance against high volatility when returns tend to crash, sellers of volatility protection in options markets tend to earn high returns. Multifactor models define risks in broader and more diverse terms than just the market portfolio. In 1976, Stephen Ross proposed arbitrage pricing theory, which asserted that investors are compensated for multiple systematic sources of risk that cannot be diversified away. The three most important macro factors are growth, inflation, and volatility, in addition to productivity, demographic, and political risk. In 1992, Eugene Fama and Kenneth French combined the equity risk factors' size and value with a market factor into a single model that better explained cross-sectional stock returns. They later added a model that also included bond risk factors to simultaneously explain returns for both asset classes. A particularly attractive aspect of risk factors is their low or negative correlation. Value and momentum risk factors, for instance, are negatively correlated, reducing the risk and increasing risk-adjusted returns above and beyond the benefit implied by the risk factors. Furthermore, using leverage and long-short strategies, factor strategies can be combined into market-neutral approaches. The combination of long positions in securities exposed to positive risks with underweight or short positions in the securities exposed to negative risks allows for the collection of dynamic risk premiums. As a result, the factors that explained returns above and beyond the CAPM were incorporated into investment styles that tilt portfolios in favor of one or more factors, and assets began to migrate into factor-based portfolios. The 2008 financial crisis underlined how asset-class labels could be highly misleading and create a false sense of diversification when investors do not look at the underlying factor risks, as asset classes came crashing down together. Over the past several decades, quantitative factor investing has evolved from a simple approach based on two or three styles to multifactor smart or exotic beta products. Smart beta funds have crossed $1 trillion AUM in 2017, testifying to the popularity of the hybrid investment strategy that combines active and passive management. Smart beta funds take a passive strategy but modify it according to one or more factors, such as cheaper stocks or screening them according to dividend payouts, to generate better returns. This growth has coincided with increasing criticism of the high fees charged by traditional active managers as well as heightened scrutiny of their performance. The ongoing discovery and successful forecasting of risk factors that, either individually or in combination with other risk factors, significantly impact future asset returns across asset classes is a key driver of the surge in ML in the investment industry. Algorithmic pioneers outperform humans at scale The track record and growth of Assets Under Management (AUM) of firms that spearheaded algorithmic trading has played a key role in generating investor interest and subsequent industry efforts to replicate their success. Systematic funds differ from HFT in that trades may be held significantly longer while seeking to exploit arbitrage opportunities as opposed to advantages from sheer speed. Systematic strategies that mostly or exclusively rely on algorithmic decision-making were most famously introduced by mathematician James Simons who founded Renaissance Technologies in 1982 and built it into the premier quant firm. Its secretive Medallion Fund, which is closed to outsiders, has earned an estimated annualized return of 35% since 1982. DE Shaw, Citadel, and Two Sigma, three of the most prominent quantitative hedge funds that use systematic strategies based on algorithms, rose to the all-time top-20 performers for the first time in 2017 in terms of total dollars earned for investors, after fees, and since inception. DE Shaw, founded in 1988 with $47 billion AUM in 2018 joined the list at number 3. Citadel started in 1990 by Kenneth Griffin, manages $29 billion and ranks 5, and Two Sigma started only in 2001 by DE Shaw alumni John Overdeck and David Siegel, has grown from $8 billion AUM in 2011 to $52 billion in 2018. Bridgewater started in 1975 with over $150 billion AUM, continues to lead due to its Pure Alpha Fund that also incorporates systematic strategies. Similarly, on the Institutional Investors 2017 Hedge Fund 100 list, five of the top six firms rely largely or completely on computers and trading algorithms to make investment decisions—and all of them have been growing their assets in an otherwise challenging environment. Several quantitatively-focused firms climbed several ranks and in some cases grew their assets by double-digit percentages. Number 2-ranked Applied Quantitative Research (AQR) grew its hedge fund assets 48% in 2017 to $69.7 billion and managed $187.6  billion firm-wide. Among all hedge funds, ranked by compounded performance over the last three years, the quant-based funds run by Renaissance Technologies achieved ranks 6 and 24, Two Sigma rank 11, D.E. Shaw no 18 and 32, and Citadel ranks 30 and 37. Beyond the top performers, algorithmic strategies have worked well in the last several years. In the past five years, quant-focused hedge funds gained about 5.1% per year while the average hedge fund rose 4.3% per year in the same period. ML driven funds attract $1 trillion AUM The familiar three revolutions in computing power, data, and ML methods have made the adoption of systematic, data-driven strategies not only more compelling and cost-effective but a key source of competitive advantage. As a result, algorithmic approaches are not only finding wider application in the hedge-fund industry that pioneered these strategies but across a broader range of asset managers and even passively-managed vehicles such as ETFs. In particular, predictive analytics using machine learning and algorithmic automation play an increasingly prominent role in all steps of the investment process across asset classes, from idea-generation and research to strategy formulation and portfolio construction, trade execution, and risk management. Estimates of industry size vary because there is no objective definition of a quantitative or algorithmic fund, and many traditional hedge funds or even mutual funds and ETFs are introducing computer-driven strategies or integrating them into a discretionary environment in a human-plus-machine approach. Morgan Stanley estimated in 2017 that algorithmic strategies have grown at 15% per year over the past six years and control about $1.5 trillion between hedge funds, mutual funds, and smart beta ETFs. Other reports suggest the quantitative hedge fund industry was about to exceed $1 trillion AUM, nearly doubling its size since 2010 amid outflows from traditional hedge funds. In contrast, total hedge fund industry capital hit $3.21 trillion according to the latest global Hedge Fund Research report. The market research firm Preqin estimates that almost 1,500 hedge funds make a majority of their trades with help from computer models. Quantitative hedge funds are now responsible for 27% of all US stock trades by investors, up from 14% in 2013. But many use data scientists—or quants—which, in turn, use machines to build large statistical models (WSJ). In recent years, however, funds have moved toward true ML, where artificially-intelligent systems can analyze large amounts of data at speed and improve themselves through such analyses. Recent examples include Rebellion Research, Sentient, and Aidyia, which rely on evolutionary algorithms and deep learning to devise fully-automatic Artificial Intelligence (AI)-driven investment platforms. From the core hedge fund industry, the adoption of algorithmic strategies has spread to mutual funds and even passively-managed exchange-traded funds in the form of smart beta funds, and to discretionary funds in the form of quantamental approaches. The emergence of quantamental funds Two distinct approaches have evolved in active investment management: systematic (or quant) and discretionary investing. Systematic approaches rely on algorithms for a repeatable and data-driven approach to identify investment opportunities across many securities; in contrast, a discretionary approach involves an in-depth analysis of a smaller number of securities. These two approaches are becoming more similar to fundamental managers take more data-science-driven approaches. Even fundamental traders now arm themselves with quantitative techniques, accounting for $55 billion of systematic assets, according to Barclays. Agnostic to specific companies, quantitative funds trade patterns and dynamics across a wide swath of securities. Quants now account for about 17% of total hedge fund assets, data compiled by Barclays shows. Point72 Asset Management, with $12 billion in assets, has been shifting about half of its portfolio managers to a man-plus-machine approach. Point72 is also investing tens of millions of dollars into a group that analyzes large amounts of alternative data and passes the results on to traders. Investments in strategic capabilities Rising investments in related capabilities—technology, data and, most importantly, skilled humans—highlight how significant algorithmic trading using ML has become for competitive advantage, especially in light of the rising popularity of passive, indexed investment vehicles, such as ETFs, since the 2008 financial crisis. Morgan Stanley noted that only 23% of its quant clients say they are not considering using or not already using ML, down from 44% in 2016. Guggenheim Partners LLC built what it calls a supercomputing cluster for $1 million at the Lawrence Berkeley National Laboratory in California to help crunch numbers for Guggenheim's quant investment funds. Electricity for the computers costs another $1 million a year. AQR is a quantitative investment group that relies on academic research to identify and systematically trade factors that have, over time, proven to beat the broader market. The firm used to eschew the purely computer-powered strategies of quant peers such as Renaissance Technologies or DE Shaw. More recently, however, AQR has begun to seek profitable patterns in markets using ML to parse through novel datasets, such as satellite pictures of shadows cast by oil wells and tankers. The leading firm BlackRock, with over $5 trillion AUM, also bets on algorithms to beat discretionary fund managers by heavily investing in SAE, a systematic trading firm it acquired during the financial crisis. Franklin Templeton bought Random Forest Capital, a debt-focused, data-led investment company for an undisclosed amount, hoping that its technology can support the wider asset manager. We looked at how ML plays a role in different industry trends around algorithmic trading. If you want to learn more about design and execution of algorithmic trading strategies, and use cases of ML in algorithmic trading, be sure to check out the book 'Hands on Machine Learning for algorithmic trading'. Using machine learning for phishing domain detection [Tutorial] Anatomy of an automated machine learning algorithm (AutoML) 10 machine learning algorithms every engineer needs to know
Read more
  • 0
  • 0
  • 25725

article-image-blockchain-iot-security
Savia Lobo
29 Aug 2017
4 min read
Save for later

How Blockchain can level up IoT Security

Savia Lobo
29 Aug 2017
4 min read
IoT contains hoard of sensors, vehicles and all devices that have embedded electronics which can communicate over the Internet. These IoT enabled devices generate tons of data every second. And with IoT Edge Analytics, these devices are getting much smarter - they can start or stop a request without any human intervention. 25 billion connected ”things” will be connected to the internet by 2020. - Gartner Research With so much data being generated by these devices, the question on everyone’s mind is: Will all this data be reliable and secure? When Brains meet Brawn: Blockchain for IoT Blockchain, an open distributed ledger, is highly secure and difficult to manipulate/corrupt by anyone connected over the network. It was initially designed for cryptocurrency based financial transactions. Bitcoin is a famous example which has Blockchain as its underlying technology. Blockchain has come a long way since then and can now be used to store anything of value. So why not save data in it? And this data will be secure just like every digital asset in a Blockchain is. Blockchain, decentralized and secured, is an ideal structure suited to form the underlying foundation for IoT data solutions. Current IoT devices and their data rely on the client service architecture. All devices are identified, authenticated, and connected via the cloud servers, which are capable of storing ample amount of data. But this requires huge infrastructure, which is all the more expensive. Blockchain not only provides an economical alternative but also since it works in a decentralized fashion it eliminates all single point of failures, creating a much secure and tougher network for IoT devices. This makes IoT more secure and reliable. Customers can therefore relax knowing their information is in safe hands. Today, Blockchain’s capabilities extend beyond processing financial transactions - It can now track billions of connected devices, process transactions and even co-ordinate between devices - a good fit for the IoT industry. Why Blockchain is perfect for IoT Inherent weak security features make IoT devices suspect. On the other hand, Blockchain with its tamper-proof ledger makes it hard to manipulate for malicious activities - thus, making it the right infrastructure for IoT solutions. Enhancing security through decentralization Blockchain makes it hard for intruders to intervene as it spans across a network of secure blocks. Change at a single location, therefore, does not affect the other blocks. The data or any value remains encrypted and is only visible to the person who has encrypted it using a private key. The cryptographic algorithms used in Blockchain technology ensure the IoT data remain private either for an individual organization or for the organizations connected in a network . Simplicity through autonomous 3rd-party-free transactions Blockchain technology is already a star in the finance sector thanks to the adoption of smart contracts, Bitcoin and other cryptocurrencies. Apart from providing a secured medium for financial transactions, it eliminates the need for third-party brokers such as banks to provide guarantee over peer-to-peer payment services. With Blockchain, IoT data can be treated in a similar manner, wherein smart contracts can be made between devices to exchange messages and data. This type of autonomy is possible because each node in the blockchain network can verify the validity of the transaction without relying on a centralized authority. Blockchain backed IoT solutions will thus enable trustworthy message sharing. Business partners can easily access and exchange confidential information within the IoT without a centralized management/regulatory authority. This means quicker transactions, lower costs and lesser opportunities for malicious intent such as data espionage. Blockchain's immutability for predicting IoT security vulnerabilities Blockchains maintain a history of all transactions made by smart devices connected within a particular network. This is possible because once you enter data in a Blockchain, it lives there forever in its immutable ledger. The possibilities for IoT solutions that leverage Blockchain’s immutability are limitless. Some obvious uses cases are more robust credit-scores and preventive health-care solutions that use data accumulated through wearables. For all the above reasons, we see significant Blockchain adoption by IoT based businesses in the near future.
Read more
  • 0
  • 0
  • 25683

article-image-why-algorithm-never-win-pulitzer
Richard Gall
21 Jan 2016
6 min read
Save for later

Why an algorithm will never win a Pulitzer

Richard Gall
21 Jan 2016
6 min read
In 2012, a year which feels a lot like the very early years of the era of data, Wired published this article on Narrative Science, an organization based in Chicago that uses Machine Learning algorithms to write news articles. Its founder and CEO, Kris Hammond, is a man whose enthusiasm for algorithmic possibilities is unparalleled. When asked whether an algorithm would win a Pulitzer in the next 20 years he goes further, claiming that it could happen in the next 5 years. Hammond’s excitement at what his organization is doing is not unwarranted. But his optimism certainly is. Unless 2017 is a particularly poor year for journalism and literary nonfiction, a Pulitzer for one of Narrative Science’s algorithms looks unlikely to say the least. But there are a couple of problems with Hammond’s enthusiasm. He fails to recognise the limitations of algorithms, the fact that the job of even the most intricate and complex Deep Learning algorithm is very specific is quite literally determined by the people who create it. “We are humanising the machine” he’s quoted as saying in a Guardian interview from June 2015. “Based on general ideas of what is important and a close understanding of who the audience is, we are giving it the tools to know how to tell us stories”. It’s important to notice here how he talks - it’s all about what ‘we’re’ doing. The algorithms that are central to Narrative Science’s mission are things that are created by people, by data scientists. It’s easy to read what’s going on as a simple case of the machines taking over. True, perhaps there is cause for concern among writers when he suggests that in 25 years 90% of news stories will be created by algorithms, but in actual fact there’s just a simple shift in where labour is focused. It's time to rethink algorithms We need to rethink how we view and talk about data science, Machine Learning and algorithms. We see, for example, algorithms as impersonal, blandly futuristic things. Although they might be crucial to our personalized online experiences, they are regarded as the hypermodern equivalent of the inauthentic handshake of a door to door salesman. Similarly, at the other end, the process of creating them are viewed as a feat of engineering, maths and statistics nerds tackling the complex interplay of statistics and machinery. Instead, we should think of algorithms as something creative, things that organize and present the world in a specific way, like a well-designed building. If an algorithm did indeed win a Pulitzer, wouldn’t it really be the team behind it that deserves it? When Hammond talks, for example, about “general ideas of what is important and a close understanding who the audience is”, he is referring very much to a creative process. Sure, it’s the algorithm that learns this, but it nevertheless requires the insight of a scientist, an analyst to consider these factors, and to consider how their algorithm will interact with the irritating complexity and unpredictability of reality. Machine Learning projects, then, are as much about designing algorithms as they are programming them. There’s a certain architecture, a politics that informs them. It’s all about prioritization and organization, and those two things aren’t just obvious; they’re certainly not things which can be identified and quantified. They are instead things that inform the way we quantify, the way we label. The very real fingerprints of human imagination, and indeed fallibility are in algorithms we experience every single day. Algorithms are made by people Perhaps we’ve all fallen for Hammond’s enthusiasm. It’s easy to see the algorithms as the key to the future, and forget that really they’re just things that are made by people. Indeed, it might well be that they’re so successful that we forget they’ve been made by anyone - it’s usually only when algorithms don’t work that the human aspect emerges. The data-team have done their job when no one realises they are there. An obvious example: You can see it when Spotify recommends some bizarre songs that you would never even consider listening to. The problem here isn’t simply a technical one, it’s about how different tracks or artists are tagged and grouped, how they are made to fit within a particular dataset that is the problem. It’s an issue of context - to build a great Machine Learning system you need to be alive to the stories and ideas that permeate within the world in which your algorithm operates - if you, as the data scientist lack this awareness, so will your Machine Learning project. But there have been more problematic and disturbing incidents such as when Flickr auto tags people of color in pictures as apes, due to the way a visual recognition algorithm has been trained. In this case, the issue is with a lack of sensitivity about the way in which an algorithm may work - the things it might run up against when it’s faced with the messiness of the real-world, with its conflicts, its identities, ideas and stories. The story of Solid Gold Bomb too, is a reminder of the unintended consequences of algorithms. It’s a reminder of the fact that we can be lazy with algorithms; instead of being designed with thought and care they become a surrogate for it - what’s more is that they always give us a get out clause; we can blame the machine if something goes wrong. If this all sounds like I’m simply down on algorithms, that I’m a technological pessimist, you’re wrong. What I’m trying to say is that it’s humans that are really in control. If an algorithm won a Pulitzer, what would that imply – it would mean the machines have won. It would mean we’re no longer the ones doing the thinking, solving problems, finding new ones. Data scientists are designers As the economy becomes reliant on technological innovation, it’s easy to remove ourselves, to underplay the creative thinking that drives what we do. That’s what Hammond’s doing, in his frenzied excitement about his company - he’s forgetting that it’s him and his team that are finding their way through today’s stories. It might be easier to see creativity at work when we cast our eyes towards game development and web design, but data scientists are designers and creators too. We’re often so keen to stress the technical aspects of these sort of roles that we forget this important aspect of the data scientist skillset.
Read more
  • 0
  • 0
  • 25662

article-image-data-science-for-non-techies-how-i-got-started
Amey Varangaonkar
20 Jul 2018
7 min read
Save for later

Data science for non-techies: How I got started (Part 1)

Amey Varangaonkar
20 Jul 2018
7 min read
As a category manager, I manage the data science portfolio of product ideas for Packt Publishing, a leading tech publisher. In simple terms, I place informed bets on where to invest, what topics to publish on etc.  While I have a decent idea of where the industry is heading and what data professionals are looking forward to learn and why etc, it is high time I walked in their shoes for a couple of reasons. Basically, I want to understand the reason behind Data Science being the ‘Sexiest job of the 21st century’, and if the role is really worth all the fame and fortune. In the process, I also wanted to explore the underlying difficulties, challenges and obstacles that every data scientist has had to endure at some point in his/her journey, or still does, maybe. The cherry on top, is that I get to use the skills I develop, to supercharge my success in my current role that is primarily insight-driven. This is the first of a series of posts on how I got started with Data Science. Today, I’m sharing my experience with devising a learning path and then gathering appropriate learning resources. Devising a learning path To understand the concepts of data science, I had to research a lot. There are tons and tons of resources out there, many of which are very good. Once you seperate the good from the rest, it can be quite intimidating to pick the options that suit you the best. Some of the primary questions that clouded my mind were: What should be my programming language of choice? R or Python? Or something else? What tools and frameworks do I need to learn? What about the statistics and mathematical aspects of machine learning? How essential are they? Two videos really helped me find the answers to the questions above: If you don’t want to spend a lot of your time mastering the art of data science, there’s a beautiful video on how to become a data scientist in six months What are the questions asked in a data science interview? What are the in-demand skills that you need to master in order to get a data science job? This video on 5 Tips For Getting a Data Science Job really is helpful. After a lot of research that included reading countless articles and blogs and discussions with experts, here is my learning plan: Learn Python Per the recently conducted Stack Overflow Developer Survey 2018, Python stood out as the most-wanted programming language, meaning the developers who do not use it yet want to learn it the most. As one of the most widely used general-purpose programming languages, Python finds large applications when it comes to data science. Naturally, you get attracted to the best option available, and Python was the one for me. The major reasons why I chose to learn Python over the other programming languages: Very easy to learn: Python is one of the easiest programming languages to learn. Not only is the syntax clean and easy to understand, even the most complex of data science tasks can be done in a few lines of Python code. Efficient libraries for Data Science: Python has a vast array of libraries suited for various data science tasks, from scraping data to visualizing and manipulating it. NumPy, SciPy, pandas, matplotlib, Seaborn are some of the libraries worth mentioning here. Python has terrific libraries for machine learning: Learning a framework or a library which makes machine learning easier to perform is very important. Python has libraries such as scikit-learn and Tensorflow that makes machine learning easier and a fun-to-do activity. To make the most of these libraries, it is important to understand the fundamentals of Python. My colleague and good friend Aaron has put out a list of top 7 Python programming books which helped as a brilliant starting point to understand the different resources out there to learn Python. The one book that stood out for me was Learn Python Programming - Second Edition - This is a very good book to start Python programming from scratch. There is also a neat skill-map present on Mapt, where you can progressively build up your knowledge of Python - right from the absolute basics to the most complex concepts. Another handy resource to learn the A-Z of Python is Complete Python Masterclass. This is a slightly long course, but it will take you from the absolute fundamentals to the most advanced aspects of Python programming. Task Status: Ongoing Learn the fundamentals of data manipulation After learning the fundamentals of Python programming, the plan is to head straight to the Python-based libraries for data manipulation, analysis and visualization. Some of the major ones are what we already discussed above, and the plan to learn them is in the following order: NumPy - Used primarily for numerical computing pandas - One of the most popular Python packages for data manipulation and analysis matplotlib - The go-to Python library for data visualization, rivaling the likes of R’s ggplot2 Seaborn - A data visualization library that runs on top of matplotlib used for creating visually appealing charts, plots and histograms Some very good resources to learn about all these libraries: Python Data Analysis Python for Data Science and Machine Learning - This is a very good course with a detailed coverage on the machine learning concepts. Something to learn later. The aim is to learn these libraries upto a fairly intermediate level, and be able to manipulate, analyze and visualize any kind of data, including missing, unstructured data and time-series data. Understand the fundamentals of statistics, linear algebra and probability In order to take a step further and enter into the foray of machine learning, the general consensus is to first understand the maths and statistics behind the concepts of machine learning. Implementing them in Python is relatively easier once you get the math right, and that is what I plan to do. I shortlisted some very good resources for this as well: Statistics for Machine Learning Stanford University - Machine Learning Course at Coursera Task Status: Ongoing Learn Machine Learning (Sounds odd I know) After understanding the math behind machine learning, the next step is to learn how to perform predictive modeling using popular machine learning algorithms such as linear regression, logistic regression, clustering, and more. Using real-world datasets, the plan is to learn the art of building state-of-the-art machine learning models using Python’s very own scikit-learn library, as well as the popular Tensorflow package. To learn how to do this, the courses I mentioned above should come in handy: Stanford University - Machine Learning Course at Coursera Python for Data Science and Machine Learning Python Machine Learning, Second Edition Task Status: To be started [box type="shadow" align="" class="" width=""]During the course of this journey, websites like Stack Overflow and Stack Exchange will be my best friends, along with the popular resources such as YouTube.[/box] As I start this journey, I plan to share my experiences and knowledge with you all. Do you think the learning path looks good? Is there anything else that I should include in my learning path? I would really love to hear your comments, suggestions and experiences. Stay tuned for the next post where I seek answers to questions such as ‘How much of Python should I learn in order to be comfortable with Data Science?’, ‘How much time should I devote per day or week to learn the concepts in Data Science?’ and much more.. Read more Why is data science important? 9 Data Science Myths Debunked 30 common data science terms explained
Read more
  • 0
  • 0
  • 25655
article-image-heres-how-you-can-handle-the-bias-variance-trade-off-in-your-ml-models
Savia Lobo
22 Jan 2018
8 min read
Save for later

Here's how you can handle the bias variance trade-off in your ML models

Savia Lobo
22 Jan 2018
8 min read
Many organizations rely on machine learning techniques in their day-today workflow, to cut down on the time required to do a job. The reason why these techniques are robust is because they undergo various tests in order to carry out correct predictions about any data fed into them. During this phase, there are also certain errors generated, which can lead to an inconsistent ML model. Two common errors that we are going to look at in this article are that of bias and Variance, and how a trade-off can be achieved between the two in order to generate a successful ML model.  Let’s first have a look at what creates these kind of errors. Machine learning techniques or more precisely supervised learning techniques involve training, often the most important stage in the ML workflow. The machine learning model is trained using the training data. How is this training data prepared? This is done by using a dataset for which the output of the algorithm is known. During the training stage, the algorithm analyzes the training data that is fed and produces patterns which are captured within an inferred function. This inferred function, which is derived after analysis of the training dataset, is the model that would be further used to map new examples. An ideal model generated from this training data should be able to generalize well. This means, it should learn from the training data and should correctly predict or classify data within any new problem instance. In general, the more complex the model is, the better it classifies the training data. However, if the model is too complex i.e it will pick up random features i.e. noise in the training data, this is the case of overfitting i.e. the model is said to overfit . On the other hand, if the model is not so complex, or missing out on important dynamics present within the data, then it is a case of underfitting. Both overfitting and underfitting are basically errors in the ML models or algorithms. Also, it is generally impossible to minimize both these errors at the same time and this leads to a condition called as the Bias-Variance Tradeoff. Before getting into knowing how to achieve the trade-off, lets simply understand how bias and variance errors occur. The Bias and Variance Error Let’s understand each error with the help of an example. Suppose you have 3 training datasets say T1, T2, and T3, and you pass these datasets through a supervised learning algorithm. The algorithm generates three different models say M1, M2, and M3 from each of the training dataset. Now let’s say you have a new input A. The whole idea is to apply each model on this new input A. Here, there can be two types of errors that can occur. If the output generated by each model on the input A is different(B1, B2, B3), the algorithm is said to have a high Variance Error. On the other hand, if the output from all the three models is same (B) but incorrect, the algorithm is said to have a high Bias Error. High Variance also means that the algorithm produces a model that is too specific to the training data, which is a typical case of Overfitting. On the other hand, high bias means that the algorithm has not picked up defining patterns from the dataset, this is a case of Underfitting. Some examples of high-bias ML algorithms are: Linear Regression, Linear Discriminant Analysis and Logistic Regression Examples of high-variance Ml algorithms are: Decision Trees, k-Nearest Neighbors and Support Vector Machines.  How to achieve a Bias-Variance Trade-off? For any supervised algorithm, having a high bias error usually means it has low variance error and vise versa. To be more specific, parametric or linear ML algorithms often have a high bias but low variance. On the other hand, non-parametric or non-linear algorithms have vice versa. The goal of any ML model is to obtain a low variance and a low bias state, which is often a task due to the parametrization of machine learning algorithms. So how can we achieve a trade-off between the two? Following are some ways to achieve the Bias-Variance Tradeoff: By minimizing the total error: The optimum location for any model is the level of complexity at which the increase in bias is equivalent to the reduction in variance. Practically, there is no analytical method to find the optimal level. One should use an accurate measure for error prediction and explore different levels of model complexity, and then choose the complexity level that reduces the overall error. Generally resampling based measures such as cross-validation should be preferred over theoretical measures such as Aikake's Information Criteria. Source: http://scott.fortmann-roe.com/docs/BiasVariance.html (The irreducible error is the noise that cannot be reduced by algorithms but can be reduced with better data cleaning.) Using Bagging and Resampling techniques: These can be used to reduce the variance in model predictions. In bagging (Bootstrap Aggregating), several replicas of the original dataset are created using random selection with replacement. One modeling algorithm that makes use of bagging is Random Forests. In Random Forest algorithm, the bias of the full model is equivalent to the bias of a single decision tree--which itself has high variance. By creating many of these trees, in effect a "forest", and then averaging them the variance of the final model can be greatly reduced over that of a single tree. Adjusting minor values in algorithms: Both the k-nearest algorithms and Support Vector Machines(SVM) algorithms have low bias and high variance. But the trade-offs in both these cases can be changed. In the K-nearest algorithm, the value of k can be increased, which would simultaneously increase the number of neighbors that contribute to the prediction. This in turn would increase the bias of the model. Whereas, in the SVM algorithm, the trade-off can be changed by an increase in the C parameter that would influence the violations of the margin allowed in the training data. This will increase the bias but decrease the variance. Using a proper Machine learning workflow: This means you have to ensure proper training by: Maintaining separate training and test sets - Splitting the dataset into training (50%), testing(25%), and validation sets ( 25%). The training set is to build the model, test set is to check the accuracy of the model, and the validation set is to evaluate the performance of your model hyperparameters. Optimizing your model by using systematic cross-validation - A cross-validation technique is a must to fine tune the model parameters, especially for unknown instances. In supervised machine learning, validation or cross-validation is used to find out the predictive accuracy within various models of varying complexity, in order to find the best model.For instance, one can use the k-fold cross validation method. Here, the dataset is divided into k folds. For each fold, train the algorithm on k-1 folds iteratively, using the remaining fold(also called as 'holdout fold')as the test set. Repeat this process until each k has acted as a test set. The average of the k recorded errors is called as the cross validation error and can serve as the performance metric for the model.   Trying out appropriate algorithms - Before relying on any model we need to first ensure that the model works best for our assumptions. One can make use of the No Free Lunch theorem, which states that one model can not work for only one problem. For instance, while using No Free lunch theorem, a random search will do the same as any of the heuristic optimization algorithms.   Tuning the hyperparameters that can give an impactful performance - Any machine learning model requires different hyperparameters such as constraints, weights or learning rates for generalizing different data patterns. Tuning these hyperparameters is necessary so that the model can optimally solve machine learning problems. Grid search and randomized search are two such methods practiced for hyperparameter tuning. So, we have listed some of the ways where you can achieve trade-off between the two. Both bias and variance are related to each other, if you increase one the other decreases and vice versa. By a trade-off, there is an optimal balance in the bias and variance which gives us a model that is neither underfit nor overfit. And finally, the ultimate goal of any supervised machine algorithm lies in isolating the signal from the dataset, and making sure that it eliminates the noise.  
Read more
  • 0
  • 0
  • 25634

article-image-how-do-data-structures-and-data-models-differ
Amey Varangaonkar
21 Dec 2017
7 min read
Save for later

How do Data Structures and Data Models differ?

Amey Varangaonkar
21 Dec 2017
7 min read
[box type="note" align="" class="" width=""]The following article is an excerpt taken from the book Statistics for Data Science, authored by James D. Miller. The book presents interesting techniques through which you can leverage the power of statistics for data manipulation and analysis.[/box] In this article, we will be zooming the spotlight on data structures and data models, and also understanding the difference between both. Data structures Data developers will agree that whenever one is working with large amounts of data, the organization of that data is imperative. If that data is not organized effectively, it will be very difficult to perform any task on that data, or at least be able to perform the task in an efficient manner. If the data is organized effectively, then practically any operation can be performed easily on that data. A data or database developer will then organize the data into what is known as data structures. Following image is a simple binary tree, where the data is organized efficiently by structuring it: A data structure can be defined as a method of organizing large amounts of data more efficiently so that any operation on that data becomes easy. Data structures are created in such a way as to implement one or more particular abstract data type (ADT), which in turn will stipulate what operations can be performed on the data structure, as well as the computational complexity of those operations. [box type="info" align="" class="" width=""]In the field of statistics, an ADT is a model for data types where a data type is defined by its behavior from the point of view (POV) of users of that data, explicitly showing the possible values, the possible operations on data of this type, and the behavior of all of these operations.[/box] Database design is then the process of using the defined data structures to produce a detailed data model, which will become the database. This data model must contain all of the required logical and physical design selections, as well as the physical storage parameters needed to produce a design in a Data Definition Language (DDL), which can then be used to create an actual database. [box type="info" align="" class="" width=""]There are varying degrees of the data model, for example, a fully attributed data model would also contain detailed attributes for each entity in the model.[/box] So, is a data structure a data model? No, a data structure is used to create a data model. Is this data model the same as data models used in statistics? Let's see in the next section. Data models You will find that statistical data models are at the heart of statistical analytics. In the simplest terms, a statistical data model is defined as the following: A representation of a state, process, or system that we want to understand and reason about In the scope of the previous definition, the data or database developer might agree that in theory or in concept, one could use the same terms to define a financial reporting database, as it is designed to contain business transactions and is arranged in data structures that allow business analysts to efficiently review the data, so that they can understand or reason about particular interests they may have concerning the business. Data scientists develop statistical data models so that they can draw inferences from them and, more importantly, make predictions about a topic of concern. Data developers develop databases so that they can similarly draw inferences from them and, more importantly, make predictions about a topic of concern (although perhaps in some organizations, databases are more focused on past and current events (transactions) than on forward-thinking ones (predictions)). Statistical data models come in a multitude of different formats and flavours (as do databases). These models can be equations linking quantities that we can observe or measure; they can also be simply, sets of rules. Databases can be designed or formatted to simplify the entering of online transactions—say, in an order entry system—or for financial reporting when the accounting department must generate a balance sheet, income statement, or profit and loss statement for shareholders. [box type="info" align="" class="" width=""]I found this example of a simple statistical data model: Newton's Second Law of Motion, which states that the net sum of force acting on an object causes the object to accelerate in the direction of the force applied, and at a rate proportional to the resulting magnitude of the force and inversely proportional to the object's mass.[/box] What's the difference? Where or how does the reader find the difference between a data structure or database and a statistical model? At a high level, as we speculated in previous sections, one can conclude that a data structure/database is practically the same thing as a statistical data model, as shown in the following image: At a high level, as we speculated in previous sections, one can conclude that a data structure/database is practically the same thing as a statistical data model. When we take the time to drill deeper into the topic, you should consider the following key points: Although both the data structure/database and the statistical model could be said to represent a set of assumptions, the statistical model typically will be found to be much more keenly focused on a particular set of assumptions concerning the generation of some sample data, and similar data from a larger population, while the data structure/database more often than not will be more broadly based A statistical model is often in a rather idealized form, while the data structure/database may be less perfect in the pursuit of a specific assumption Both a data structure/database and a statistical model are built around relationships between variables The data structure/database relationship may focus on answering certain questions, such as: What are the total orders for specific customers? What are the total orders for a specific customer who has purchased from a certain salesperson? Which customer has placed the most orders? Statistical model relationships are usually very simple, and focused on proving certain questions: Females are shorter than males by a fixed amount Body mass is proportional to height The probability that any given person will partake in a certain sport is a function of age, sex, and socioeconomic status Data structures/databases are all about the act of summarizing data based on relationships between variables Relationships The relationships between variables in a statistical model may be found to be much more complicated than simply straightforward to recognize and understand. An illustration of this is awareness of effect statistics. An effect statistic is one that shows or displays a difference in value to one that is associated with a difference related to one or more other variables. Can you image the SQL query statements you'd use to establish a relationship between two database variables based upon one or more effect statistic? On this point, you may find that a data structure/database usually aims to characterize relationships between variables, while with statistical models, the data scientist looks to fit the model to prove a point or make a statement about the population in the model. That is, a data scientist endeavors to make a statement about the accuracy of an estimate of the effect statistic(s) describing the model! One more note of interest is that both a data structure/database and a statistical model can be seen as tools or vehicles that aim to generalize a population; a database uses SQL to aggregate or summarize data, and a statistical model summarizes its data using effect statistics. The above argument presented the notion that data structures/databases and statistical data models are, in many ways, very similar. If you found this excerpt to be useful, check out the book Statistics for Data Science, which demonstrates different statistical techniques for implementing various data science tasks such as pre-processing, mining, and analysis.  
Read more
  • 0
  • 0
  • 25625

article-image-polyglot-persistence-what-is-it-and-why-does-it-matter
Richard Gall
21 Jul 2018
3 min read
Save for later

Polyglot persistence: what is it and why does it matter?

Richard Gall
21 Jul 2018
3 min read
Polyglot persistence is a way of storing data. It's an approach that acknowledges that often there is no one size fits all solution to data storage. From the types of data you're trying to store to your application architecture, polyglot persistence is a hybrid solution to data management. Think of polyglot programming. If polyglot programming is about using a variety of languages according to the context in which your working, polyglot persistence is applying that principle to database architecture. For example, storing transactional data in Hadoop files is possible, but makes little sense. On the other hand, processing petabytes of Internet logs using a Relational Database Management System (RDBMS) would also be ill-advised. These tools were designed to tackle specific types of tasks; even though they can be co-opted to solve other problems, the cost of adapting the tools to do so would be enormous. It is a virtual equivalent of trying to fit a square peg in a round hole. Polyglot persistence: an example For example, consider a company that sells musical instruments and accessories online (and in a network of shops). At a high-level, there are a number of problems that a company needs to solve to be successful: Attract customers to its stores (both virtual and physical). Present them with relevant products (you would not try to sell a drum kit to a pianist, would you?!). Once they decide to buy, process the payment and organize shipping. To solve these problems a company might choose from a number of available technologies that were designed to solve these problems: Store all the products in a document-based database such as MongoDB, Cassandra, DynamoDB, or DocumentDB. There are multiple advantages of document databases: flexible schema, sharding (breaking bigger databases into a set of smaller, more manageable ones), high availability, and replication, among others. Model the recommendations using a graph-based database (such as Neo4j, Tinkerpop/Gremlin, or GraphFrames for Spark): such databases reflect the factual and abstract relationships between customers and their preferences. Mining such a graph is invaluable and can produce a more tailored offering for a customer. For searching, a company might use a search-tailored solution such as Apache Solr or ElasticSearch. Such a solution provides fast, indexed text searching capabilities. Once a product is sold, the transaction normally has a well-structured schema (such as product name, price, and so on.) To store such data (and later process and report on it) relational databases are best suited. With polyglot persistence, a company always chooses the right tool for the right job instead of trying to coerce a single technology into solving all of its problems. Read next: How to optimize Hbase for the Cloud [Tutorial] The trouble with Smart Contracts Indexing, Replicating, and Sharding in MongoDB [Tutorial]
Read more
  • 0
  • 0
  • 25589
article-image-5-cool-ways-transfer-learning-used-today
Savia Lobo
15 Nov 2017
7 min read
Save for later

5 cool ways Transfer Learning is being used today

Savia Lobo
15 Nov 2017
7 min read
Machine learning has gained a lot of traction over the years because of the predictive solutions that it provides, including the development of intelligent, and reliable models. However, training the models is a laborious task because it takes time to curate the labeled data within the model and then to get the model ready. Reducing the time involved in training and labeling can be overcome by using the novel approach of Transfer Learning - a smarter and effective form of machine learning, where you can use the learnings of one scenario and apply that learning to a different but related problem. How exactly does Transfer Learning work? Transfer learning reduces the efforts to build a model from scratch by using the fundamental logic or base algorithms within one domain and applying it to another. For instance, in the real-world, the balancing logic learned while riding a bicycle can be transferred to learn driving other two-wheeled vehicles. Similarly, in the case of machine learning, transfer learning can be used to transfer the algorithmic logic from one ML model to the other. Let’s look into some of the possible use cases of transfer learning. [dropcap]1[/dropcap] Real-world Simulations Digital simulation is better than creating a physical prototype for real-world implementations. Training a robot in the real-world surroundings is both time and cost consuming. In order to minimize this, robots can now be trained using simulation and the knowledge acquired can be thus transferred onto a real-world robot. This is done using progressive networks, which are ideal for a simulation to the real world transfer of policies in robot control domains. These networks consist of essential features for learning numerous tasks in sequence while enabling transfer and are resistant to catastrophic forgetting--a tendency of Artificial Neural Networks(ANNs) to completely forget previously learned information, on learning a new information.   Another application of simulation can be seen while training self-driving cars, which are trained using simulations through video games. Udacity has open sourced its self-driving car simulator which allows training self-driving cars through GTA 5 and many other video games. However, not all features of a simulation are replicated successfully when they are brought into the real world, as the interactions in the real world are more complex.   [dropcap]2[/dropcap] Gaming The adoption of Artificial Intelligence has taken gaming to an altogether new level. DeepMind’s neural network program AlphaGo is a testament to this, as it successfully defeated a professional Go player. AlphaGo is a master in Go but fails when tasked to play other games. This is because its algorithm is tailored to play Go. So, the disadvantage of using ANNs in gaming is that they cannot master all games as a human brain does. In order to do this, AlphaGo has to totally forget Go and adapt itself to the new algorithms and techniques of the new game. With transfer Learning, the tactics learned in a game can be reapplied to play another game.   An example of how Transfer learning is implemented in gaming can be seen in MadRTS, a commercial Real Time Strategy games. MadRTS, is developed to carry out military simulations. MadRTS uses CARL(CAse-based Reinforcement Learner), a multi-tiered architecture which combines Case-based reasoning(CBR) and Reinforcement Learning(RL). CBR provides an approach to tackle unseen but related problems based on past experiences within each level of the game. RL algorithms, on the other hand, allow the model to carry out good approximations to a situation, based on the agent’s experience in its environment--also known as Markov’s Decision Process. These CBR/RL transfer learning agents are evaluated in order to perform effective learning on tasks given in MadRTS, and should be able to learn better across tasks by transferring experience. [dropcap]3[/dropcap] Image Classification Neural networks are experts in recognizing objects within an image as they are trained on huge datasets of labeled images, which is time-consuming. How transfer learning helps here is, it reduces the time to train the model by pre-training the model using ImageNet, which contains millions of images from different categories. Let’s assume that a convolutional neural network - for instance, a VGG-16 ConvNet - has to be trained to recognize images within a dataset. Firstly, it is pre-trained using ImageNet. Then, it is trained layer-wise starting by replacing the final layer with a softmax layer and training it until the training saturates. Further, the other dense layers are trained progressively. By the end of the training, the ConvNet model is successful in learning to detect images from the dataset provided. In cases where the dataset is not similar to the pre-trained model data, one can finetune weights in the higher layers of the ConvNet by backpropagation methods. The dense layers contain the logic for detecting the image, thus, tuning the higher layers won’t affect the base logic. The convolutional neural networks can be trained on Keras, using Tensorflow or as a backend. An example of Image Classification can be seen in the field of medical imaging, where the convolutional model is trained on ImageNet to solve kidney detection problem in ultrasound images. [dropcap]4[/dropcap] Zero Shot translation Zero shot translation is an extended part of supervised learning, where the goal of the model is, learning to predict novel values from values that are not present in the training dataset. The prominent working example of zero shot translation can be seen in Google’s Neural Translation model(GNMT), which allows for effective cross-lingual translations. Prior to Zero shot implementation, two discrete languages had to be translated using a pivot language. For instance, to translate Korean to Japanese, Korean had to be first translated into English and then English to Japanese. Here, English is the pivot language that acts as a medium to translate Korean to Japanese. This resulted in a translated language that was full of distortions created by the first language pair. Zero shot translation rips off the need for a pivot language. It uses available training data to learn the translational knowledge applied, to translate a new language pair. Another instance of Zero shot translation can be seen in Image2Emoji, which combines visuals and texts to predict unseen emoji icons in a zero shot approach. [dropcap]5[/dropcap] Sentiment Classification Businesses can know their customers better by implementing Sentiment Analysis, which helps them to understand emotions and polarity (negative or positive) underlying the feedback and the product reviews. Analyzing sentiments for a new text corpus is difficult to build up, as training the models to detect different emotions is difficult. A solution to this is Transfer Learning. This involves training the models on any one domain, twitter feeds for instance, and fine-tuning them to another domain you wish to perform Sentiment Analysis on; say movie reviews. Here, deep learning models are trained on twitter feeds by carrying out sentiment analysis of the text corpus and also detecting the polarity of each statement. Once the model is trained on understanding emotions through polarity of the twitter feeds, its underlying language model and learned representation is transferred onto the model assigned a task to analyze sentiments within movie reviews. Here, an RNN model is trained on logistic regression techniques carried out sentiment analysis on the twitter feeds. The word embeddings and the recurrent weights learned from the source domain (twitter feeds) are re-used in the target domain (movie reviews) to classify sentiments within the latter domain. Conclusion Transfer learning has brought in a new wave of learning in machines by reusing algorithms and the applied logic, thus speeding up their learning process. This directly results in a reduction in the capital investment and also the time invested to train a model. This is why many organizations are looking forward to replicating such a learning onto their machine learning models. Also, transfer learning has been carried out successfully in the field of Image processing, Simulations, Gaming, and so on. How transfer learning affects the learning curve of machines in other sectors in the future, is worth watching out for.
Read more
  • 0
  • 0
  • 25556

article-image-what-is-automated-machine-learning
Wilson D'souza
17 Oct 2017
6 min read
Save for later

What is Automated Machine Learning (AutoML)?

Wilson D'souza
17 Oct 2017
6 min read
Are you a proud machine learning engineer who hates that the job tests your limits as a human being? Do you dread the long hours of data experimentation and data modeling that leave you high and dry? Automated Machine Learning or AutoML can put that smile back on your face. A self-replicating AI algorithm, AutoML is the latest tool that is being applied in the real world today, and AI market leaders such as Google have made a significant investment to research further in this field. AutoML has seen a steep rise in research and new tools over the last couple of years, but its recent mention during Google IO 2017 has piqued the interest of the entire developer community. What is AutoML all about and what makes it so interesting? Evolution of automated machine learning Before we try to understand AutoML, let’s look at what triggered the need for automated machine learning. Until now, building machine learning models that work in the real world has been a domain ruled by researchers, scientists, and machine learning experts. The process of manually designing a machine learning model involves several complex and time-consuming steps such as: Pre-processing data Selecting appropriate ML architecture Optimizing hyperparameters Constructing models Evaluating suitability of models Add to this, the several layers of neural networks required for an efficient ML architecture -- an n-layer neural network could result in nn potential networks. This level of complexity could be overwhelming for the millions of developers who are keen on embracing machine learning. AutoML tries to solve this problem of complexity and makes machine learning accessible to a large group of developers by automating routine but complex tasks such as the design of neural networks. Since this cuts down development time significantly and takes care of several complex tasks involved in building machine learning models, AutoML is expected to play a crucial role in bringing machine learning to the mainstream. Approaches to automating model generation   With a growing body of research, AutoML aims to automate the following tasks in the field of machine learning: Model Selection Parameter Tuning Meta Learning Ensemble Construction It does this by using a wide range of algorithms and approaches such as: Bayesian Optimization: One of the fundamental approaches for automating model generation is to use Bayesian methods for hyperparameter tuning. By modeling the uncertainty of parameter performance, different variations of the model can be explored which offers an optimal solution. Meta-learning and Ensemble Construction: To further increase AutoML efficiency, meta-learning techniques are used to find and pick optimal hyperparameter settings. These techniques can be further coupled with auto-ensemble construction techniques to create effective ensemble model from a collection of models that undergo optimization. Using these techniques, a high level of accuracy can be achieved throughout the process of automated generation of models. Genetic Programming: Certain tools like TPOT also make use of a variation of genetic programming (tree-based pipeline optimization) to automatically design and optimize ML models that offer highly accurate results for a given set of data. This approach makes use of operators at various stages of the data pipeline which are assembled together in the form of a tree-based pipeline. These are then further optimized and newer pipelines are auto-generated using genetic programming. If these weren’t enough, Google in its recent posts disclosed that they are using reinforcement learning approach to give a further push to develop efficient AutoML techniques. What are some tools in this area? Although it’s still early days, we can already see some frameworks emerging to automate the generation of your machine learning models.   Auto-sklearn: Auto-sklearn, the tool which won the ChaLearn AutoML Challenge, provides a wrapper around the popular Python library scikit-learn to automate machine learning. This is a great addition to the ever-growing ecosystem of Python data science tools. Built on top of Bayesian optimization, it takes away the hassle of algorithm selection, parameter tuning, and ensemble construction while building machine learning pipelines. With auto-sklearn, developers can create rapid iterations and refinements to their machine learning models, thereby saving a significant amount of development time. The tool is still in its early stages of development, so expect a few hiccups while using it. DataRobot: DataRobot offers a machine learning automation platform to all levels of data scientists aimed at significantly reducing the time to build and deploy predictive models. Since it’s a cloud platform it offers great power and speed throughout the process of automating the model generation process. In addition to automating the development of predictive models, it offers other useful features such as a web-based interface, compatibility with several leading tools such as Hadoop and Spark, scalability, and rapid deployment. It’s one of those few machine learning automation platforms which are ready for industry use. TPOT: TPOT is yet another Python tool meant for automated machine learning. It uses a genetic programming approach to iterate and optimize machine learning models. As in the case of auto-sklearn, TPOT is also built on top of scikit-learn. It has a growing interest level on GitHub with 2400 stars and has observed a 100% rise in the past one year alone. Its goals, however, are quite similar to those of Auto-sklearn: feature construction, feature selection, model selection, and parameter optimization. With these goals in mind, TPOT aims at building efficient machine learning systems in lesser time and with better accuracy. Will automated machine learning replace developers? AutoML as a concept is still in its infancy. But as market leaders like Google, Facebook, and others research more in this field, AutoML will keep evolving at a brisk pace. Assuming that AutoML would replace humans in the field of data science, however, is a far-fetched thought and nowhere near reality. Here is why. AutoML as a technique is meant to make the neural network design process efficient rather than replace humans and researchers in the field of building neural networks. The primary goal of AutoML is to help experienced data scientists be more efficient at their work i.e., enhance productivity by a huge margin and to reduce the steep learning curve for the many developers who are keen on designing ML models - i.e., make ML more accessible. With the advancements in this field, it’s exciting times for developers to embrace machine learning and start building intelligent applications. We see automated machine learning as a game changer with the power to truly democratize the building of AI apps. With automated machine learning, you don’t have to be a data scientist to develop an elegant AI app!
Read more
  • 0
  • 0
  • 25468
Modal Close icon
Modal Close icon