Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials - Data

1210 Articles
article-image-openai-gym-environments-wrappers-and-monitors-tutorial
Packt Editorial Staff
17 Jul 2018
9 min read
Save for later

Extending OpenAI Gym environments with Wrappers and Monitors [Tutorial]

Packt Editorial Staff
17 Jul 2018
9 min read
In this article we are going to discuss two OpenAI Gym functionalities; Wrappers and Monitors. These functionalities are present in OpenAI to make your life easier and your codes cleaner. It provides you these convenient frameworks to extend the functionality of your existing environment in a modular way and get familiar with an agent's activity. So, let's take a quick overview of these classes. This article is an extract taken from the book, Deep Reinforcement Learning Hands-On, Second Edition written by, Maxim Lapan. What are Wrappers? Very frequently, you will want to extend the environment's functionality in some generic way. For example, an environment gives you some observations, but you want to accumulate them in some buffer and provide to the agent the N last observations, which is a common scenario for dynamic computer games, when one single frame is just not enough to get full information about the game state. Another example is when you want to be able to crop or preprocess an image's pixels to make it more convenient for the agent to digest, or if you want to normalize reward scores somehow. There are many such situations which have the same structure: you'd like to “wrap” the existing environment and add some extra logic doing something. Gym provides you with a convenient framework for these situations, called a Wrapper class. How does a wrapper work? The class structure is shown on the following diagram. The Wrapper class inherits the Env class. Its constructor accepts the only argument: the instance of the Env class to be “wrapped”. To add extra functionality, you need to redefine the methods you want to extend like step() or reset(). The only requirement is to call the original method of the superclass. Figure 1: The hierarchy of Wrapper classes in Gym. To handle more specific requirements, like a Wrapper which wants to process only observations from the environment, or only actions, there are subclasses of Wrapper which allow filtering of only a specific portion of information. They are: ObservationWrapper: You need to redefine its observation(obs) method. Argument obs is an observation from the wrapped environment, and this method should return the observation which will be given to the agent. RewardWrapper: Exposes the method reward(rew), which could modify the reward value given to the agent. ActionWrapper: You need to override the method action(act) which could tweak the action passed to the wrapped environment to the agent. Now let’s implement some wrappers To make it slightly more practical, let's imagine a situation where we want to intervene in the stream of actions sent by the agent and, with a probability of 10%, replace the current action with random one. By issuing the random actions, we make our agent explore the environment and from time to time drift away from the beaten track of its policy. This is an easy thing to do using the ActionWrapper class. import gym from typing import TypeVar import random Action = TypeVar('Action') class RandomActionWrapper(gym.ActionWrapper):     def __init__(self, env, epsilon=0.1):         super(RandomActionWrapper, self).__init__(env)         self.epsilon = epsilon Here we initialize our wrapper by calling a parent's __init__ method and saving epsilon (a probability of a random action). def action(self, action):         if random.random() < self.epsilon:             print("Random!")            return self.env.action_space.sample()        return action This is a method that we need to override from a parent's class to tweak the agent's actions. Every time we roll the die, with the probability of epsilon, we sample a random action from the action space and return it instead of the action the agent has sent to us. Please note, by using action_space and wrapper abstractions, we were able to write abstract code which will work with any environment from the Gym. Additionally, we print the message every time we replace the action, just to check that our wrapper is working. In production code, of course, this won't be necessary. if __name__ == "__main__":    env = RandomActionWrapper(gym.make("CartPole-v0")) Now it's time to apply our wrapper. We create a normal CartPole environment and pass it to our wrapper constructor. From here on we use our wrapper as a normal Env instance, instead of the original CartPole. As the Wrapper class inherits the Env class and exposes the same interface, we can nest our wrappers in any combination we want. This is a powerful, elegant and generic solution: obs = env.reset()    total_reward = 0.0    while True:        obs, reward, done, _ = env.step(0)        total_reward += reward        if done:            break    print("Reward got: %.2f" % total_reward) Here is almost the same code, except that every time we issue the same action: 0. Our agent is dull and always does the same thing. By running the code, you should see that the wrapper is indeed working: rl_book_samples/ch02$ python 03_random_actionwrapper.py WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype. Random! Random! Random! Random! Reward got: 12.00 If you want, you can play with the epsilon parameter on the wrapper's creation and check that randomness improves the agent's score on average. We should move on and look at another interesting gem hidden inside Gym: Monitor. What is a Monitor? Another class you should be aware of is Monitor. It is implemented like Wrapper and can write information about your agent's performance in a file with optional video recording of your agent in action. Some time ago, it was possible to upload the result of Monitor class' recording to the https://gym.openai.com website and see your agent's position in comparison to other people's results (see thee following screenshot), but, unfortunately, at the end of August 2017, OpenAI decided to shut down this upload functionality and froze all the results. There are several activities to implement an alternative to the original website, but they are not ready yet. I hope this situation will be resolved soon, but at the time of writing it's not possible to check your result against those of others. Just to give you an idea of how the Gym web interface looked, here is the CartPole environment leaderboard: Figure 2: OpenAI Gym web interface with CartPole submissions Every submission in the web interface had details about training dynamics. For example, below is the author's solution for one of Doom's mini-games: Figure 3: Submission dynamics on the DoomDefendLine environment. Despite this, Monitor is still useful, as you can take a look at your agent's life inside the environment. How to add Monitor to your agent So, here is how we add Monitor to our random CartPole agent, which is the only difference (the whole code is in Chapter02/04_cartpole_random_monitor.py). if __name__ == "__main__":    env = gym.make("CartPole-v0")    env = gym.wrappers.Monitor(env, "recording") The second argument we're passing to Monitor is the name of the directory it will write the results to. This directory shouldn't exist, otherwise your program will fail with an exception (to overcome this, you could either remove the existing directory or pass the force=True argument to Monitor class' constructor). The Monitor class requires the FFmpeg utility to be present on the system, which is used to convert captured observations into an output video file. This utility must be available, otherwise Monitor will raise an exception. The easiest way to install FFmpeg is by using your system's package manager, which is OS distribution-specific. To start this example, one of three extra prerequisites should be met: The code should be run in an X11 session with the OpenGL extension (GLX) The code should be started in an Xvfb virtual display You can use X11 forwarding in ssh connection The cause of this is video recording, which is done by taking screenshots of the window drawn by the environment. Some of the environment uses OpenGL to draw its picture, so the graphical mode with OpenGL needs to be present. This could be a problem for a virtual machine in the cloud, which physically doesn't have a monitor and graphical interface running. To overcome this, there is a special “virtual” graphical display, called Xvfb (X11 virtual framebuffer), which basically starts a virtual graphical display on the server and forces the program to draw inside it. That would be enough to make Monitor happily create the desired videos. To start your program in the Xvbf environment, you need to have it installed on your machine (it usually requires installing the package xvfb) and run the special script xvfb-run: $ xvfb-run -s "-screen 0 640x480x24" python 04_cartpole_random_monitor.py [2017-09-22 12:22:23,446] Making new env: CartPole-v0 [2017-09-22 12:22:23,451] Creating monitor directory recording [2017-09-22 12:22:23,570] Starting new video recorder writing to recording/openaigym.video.0.31179.video000000.mp4 Episode done in 14 steps, total reward 14.00 [2017-09-22 12:22:26,290] Finished writing results. You can upload them to the scoreboard via gym.upload('recording') As you may see from the log above, video has been written successfully, so you can peek inside one of your agent's sections by playing it. Another way to record your agent's actions is using ssh X11 forwarding, which uses ssh ability to tunnel X11 communications between the X11 client (Python code which wants to display some graphical information) and X11 server (software which knows how to display this information and has access to your physical display). In X11 architecture, the client and the server are separated and can work on different machines. To use this approach, you need the following: X11 server running on your local machine. Linux comes with X11 server as a standard component (all desktop environments are using X11). On a Windows machine you can set up third-party X11 implementations like open source VcXsrv (available in https://sourceforge.net/projects/vcxsrv/). The ability to log into your remote machine via ssh, passing –X command line option: ssh –X servername. This enables X11 tunneling and allows all processes started in this session to use your local display for graphics output. Then you can start a program which uses Monitor class and it will display the agent's actions, capturing the images into a video file. To summarize, we discussed the two extra functionalities in an OpenAI Gym; Wrappers and Monitors. To solve complex real world problems in Deep Learning, grab this practical guide Deep Reinforcement Learning Hands-On, Second Edition today. How Reinforcement Learning works How to implement Reinforcement Learning with TensorFlow Top 5 tools for reinforcement learning
Read more
  • 0
  • 0
  • 53491

article-image-top-5-deep-learning-architectures
Amey Varangaonkar
24 Jul 2018
9 min read
Save for later

Top 5 Deep Learning Architectures

Amey Varangaonkar
24 Jul 2018
9 min read
If you are a deep learning practitioner or someone who wants to get into the world of deep learning, you might be well acquainted with neural networks already. Neural networks, inspired by biological neural networks, are pretty useful when it comes to solving complex, multi-layered computational problems. Deep learning has stood out pretty well in several high-profile research fields - including facial and speech recognition, natural language processing, machine translation, and more. In this article, we look at the top 5 popular and widely-used deep learning architectures you should know in order to advance your knowledge or deep learning research. Convolutional Neural Networks Convolutional Neural Networks, or CNNs in short, are the popular choice of neural networks for different Computer Vision tasks such as image recognition. The name ‘convolution’ is derived from a mathematical operation involving the convolution of different functions. There are 4 primary steps or stages in designing a CNN: Convolution: The input signal is received at this stage Subsampling: Inputs received from the convolution layer are smoothened to reduce the sensitivity of the filters to noise or any other variation Activation: This layer controls how the signal flows from one layer to the other, similar to the neurons in our brain Fully connected: In this stage, all the layers of the network are connected with every neuron from a preceding layer to the neurons from the subsequent layer Here is an in-depth look at the CNN Architecture and its working, as explained by the popular AI Researcher Giancarlo Zaccone. A sample CNN in action Advantages of CNN Very good for visual recognition Once a segment within a particular sector of an image is learned, the CNN can recognize that segment present anywhere else in the image Disadvantages of CNN CNN is highly dependent on the size and quality of the training data Highly susceptible to noise Recurrent Neural Networks Recurrent Neural Networks (RNNs) have been very popular in areas where the sequence in which the information is presented is crucial. As a result, they find a lot applications in real-world domains such as natural language processing, speech synthesis and machine translation. RNNs are called ‘recurrent’ mainly because a uniform task is performed for every single element of a sequence, with the output dependant on the previous computations as well. Think of these networks as having a memory, where every calculated information is captured, stored and utilized to calculate the final outcome. Over the years, quite a few varieties of RNNs have been researched and developed: Bidirectional RNN - The output in this type of RNN depends not only on the past but also the future outcomes Deep RNN - In this type of RNN, there are multiple layers present per step, allowing for a greater rate of learning and more accuracy RNNs can be used to build industry-standard chatbots that can be used to interact with customers on websites. Given a sequence of signals from an audio wave, RNNs can also be used to predict a correct sequence of phonetic segments with a given probability. Advantages of RNN Unlike a traditional neural network, an RNN shares the same parameters across all steps. This greatly reduces the number of parameters that we need to learn RNNs can be used along with CNNs to generate accurate descriptions for unlabeled images. Disadvantages of RNN RNNs find it difficult to track long-term dependencies. This is especially true in case of long sentences and paragraphs having too many words in between the noun and the verb. RNNs cannot be stacked into very deep models. This is due to the activation function used in RNN models, making the gradient decay over multiple layers. Autoencoders Autoencoders apply the principle of backpropagation in an unsupervised environment. Autoencoders, interestingly, have a close resemblance to PCA (Principal Component Analysis) except that they are more flexible. Some of the popular applications of Autoencoders is anomaly detection - for example detecting fraud in financial transactions in banks. Basically, the core task of autoencoders is to identify and determine what constitutes regular, normal data and then identify the outliers or anomalies. Autoencoders usually represent data through multiple hidden layers such that the output signal is as close to the input signal. There are 4 major types of autoencoders being used today: Vanilla autoencoder - the simplest form of autoencoders there is, i.e. a neural net with one hidden layer Multilayer autoencoder - when one hidden layer is not enough, an autoencoder can be extended to include more hidden layers Convolutional autoencoder - In this type, convolutions are used in the autoencoders instead of fully-connected layers Regularized autoencoder - this type of autoencoders use a special loss function that enables the model to have properties beyond the basic ability to copy a given input to the output. This article demonstrates training an autoencoder using H20, a popular machine learning and AI platform. A basic representation of Autoencoder Advantages of Autoencoders Autoencoders give a resultant model which is primarily based on the data rather than predefined filters Very less complexity means it’s easier to train them Disadvantages of Autoencoders Training time can be very high sometimes If the training data is not representative of the testing data, then the information that comes out of the model can be obscured and unclear Some autoencoders, especially of the variational type, cause a deterministic bias being introduced in the model Generative Adversarial Networks The basic premise of Generative Adversarial Networks (GANs) is the training of two deep learning models simultaneously. These deep learning networks basically compete with each other - one model that tries to generate new instances or examples is called as the generator. The other model that tries to classify if a particular instance originates from the training data or from the generator is called as the discriminator. GANs, a breakthrough recently in the field of deep learning,  was a concept put forth by the popular deep learning expert Ian Goodfellow in 2014. It finds large and important applications in Computer Vision, especially image generation. Read more about the structure and the functionality of the GAN from the official paper submitted by Ian Goodfellow. General architecture of GAN (Source: deeplearning4j) Advantages of GAN Per Goodfellow, GANs allow for efficient training of classifiers in a semi-supervised manner Because of the improved accuracy of the model, the generated data is almost indistinguishable from the original data GANs do not introduce any deterministic bias unlike variational autoencoders Disadvantages of GAN Generator and discriminator working efficiently is crucial to the success of GAN. The whole system fails even if one of them fails Both the generator and discriminator are separate systems and trained with different loss functions. Hence the time required to train the entire system can get quite high. Interested to know more about GANs? Here’s what you need to know about them. ResNets Ever since they gained popularity in 2015, ResNets or Deep Residual Networks have been widely adopted and used by many data scientists and AI researchers. As you already know, CNNs are highly useful when it comes to solving image classification and visual recognition problems. As these tasks become more complex, training of the neural network starts to get a lot more difficult, as additional deep layers are required to compute and enhance the accuracy of the model. Residual learning is a concept designed to tackle this very problem, and the resultant architecture is popularly known as a ResNet. A ResNet consists of a number of residual modules - where each module represents a layer. Each layer consists of a set of functions to be performed on the input. The depth of a ResNet can vary greatly - the one developed by Microsoft researchers for an image classification problem had 152 layers! A basic building block of ResNet (Source: Quora) Advantages of ResNets ResNets are more accurate and require less weights than LSTMs and RNNs in some cases They are highly modular. Hundreds and thousands of residual layers can be added to create a network and then trained. ResNets can be designed to determine how deep a particular network needs to be. Disadvantages of ResNets If the layers in a ResNet are too deep, errors can be hard to detect and cannot be propagated back quickly and correctly. At the same time, if the layers are too narrow, the learning might not be very efficient. Apart from the ones above, a few more deep learning models are being increasingly adopted and preferred by data scientists. These definitely deserve a honorable mention: LSTM: LSTMs are a special kind of Recurrent Neural Networks that include a special memory cell that can hold information for long periods of time. A set of gates is used to determine when a particular information enters the memory and when it is forgotten. SqueezeNet: One of the newer but very powerful deep learning architectures which is extremely efficient for low bandwidth platforms such as mobile. CapsNet: CapsNet, or Capsule Networks, is a recent breakthrough in the field of Deep Learning and neural network modeling. Mainly used for accurate image recognition tasks, and is an advanced variation of the CNNs. SegNet: A popular deep learning architecture especially used to solve the image segmentation problem. Seq2Seq: An upcoming deep learning architecture being increasingly used for machine translation and building efficient chatbots So there you have it! Thanks to the intense efforts in research in deep learning and AI, we now have a variety of deep learning models at our disposal to solve a variety of problems - both functional and computational. What’s even better is that we have the liberty to choose the most appropriate deep learning architecture based on the problem at hand. [box type="shadow" align="" class="" width=""]Editor’s Tip: It is very important to know the best deep learning frameworks you can use to train your models. Here are the top 10 deep learning frameworks for you.[/box] In contrast to the traditional programming approach where we tell the computer what to do, the deep learning models figure out the problem and devise the most appropriate solution on their own - however complex the problem may be. No wonder these deep learning architectures are being researched on and deployed on a large scale by the major market players such as Google, Facebook, Microsoft and many others. Packt Explains… Deep Learning in 90 seconds Behind the scenes: Deep learning evolution and core concepts Facelifting NLP with Deep Learning  
Read more
  • 0
  • 0
  • 53414

article-image-how-to-ace-a-data-science-interview
Richard Gall
02 Sep 2019
12 min read
Save for later

How to ace a data science interview

Richard Gall
02 Sep 2019
12 min read
So, you want to be a data scientist. It’s a smart move: it’s a job that’s in high demand, can command a healthy salary, and can also be richly rewarding and engaging. But to get the job, you’re going to have to pass a data science interview - something that’s notoriously tough. One of the reasons for this is that data science is a field that is incredibly diverse. I mean that in two different ways: on the one hand it’s a role that demands a variety of different skills (being a good data scientist is about much more than just being good at math). But it's also diverse in the sense that data science will be done differently at every company. That means that every data science interview is going to be different. If you specialize too much in one area, you might well be severely limiting your opportunities. There are plenty of articles out there that pretend to have all the answers to your next data science interview. And while these can be useful, they also treat job interviews like they’re just exams you need to pass. They’re not - you need to have a wide range of knowledge, but you also need to present yourself as a curious and critical thinker, and someone who is very good at communicating. You won’t get a data science by knowing all the answers. But you might get it by asking the right questions and talking in the right way. So, with all that in mind, here are what you need to do to ace your data science interview. Know the basics of data science This is obvious but it’s impossible to overstate. If you don’t know the basics, there’s no way you’ll get the job - indeed, it’s probably better for your sake that you don’t get it! But what are these basics? Basic data science interview questions "What is data science?" This seems straightforward, but proving you’ve done some thinking about what the role actually involves demonstrates that you’re thoughtful and self-aware - a sign of any good employee. "What’s the difference between supervised and unsupervised learning?" Again, this is straightforward, but it will give the interviewer confidence that you understand the basics of machine learning algorithms. "What is the bias and variance tradeoff? What is overfitting and underfitting?" Being able to explain these concepts in a clear and concise manner demonstrates your clarity of thought. It also shows that you have a strong awareness of the challenges of using machine learning and statistical systems. If you’re applying for a job as a data scientist you’ll probably already know the answers to all of these. Just make sure you have a clear answer and that you can explain each in a concise manner. Know your algorithms Knowing your algorithms is a really important part of any data science interview. However, it’s important to not get hung up on the details. Trying to learn everything you know about every algorithm you know isn’t only impossible, it’s also not going to get you the job. What’s important instead is demonstrating that you understand the differences between algorithms, and when to use one over another. Data science interview questions about algorithms you might be asked "When would you use a supervised machine learning algorithm?" "Can you name some supervised machine learning algorithms and the differences between them?" (supervised machine learning algorithms include Support Vector Machines, Naive Bayes, K-nearest Neighbor Algorithm, Regression, Decision Trees) "When would you use an unsupervised machine learning algorithm?" (unsupervised machine learning algorithms include K-Means, autoencoders, Generative Adversarial Networks, and Deep Belief Nets.) Name some unsupervised machine learning algorithms and how they’re different from one another. "What are classification algorithms?" There are others, but try to focus on these as core areas. Remember, it’s also important to always talk about your experience - that’s just as useful, if not even more useful than listing off the differences between different machine learning algorithms. Some of the questions you face in a data science interview might even be about how you use algorithms: "Tell me about the time you used an algorithm. Why did you decide to use it? Were there any other options?" "Tell me about a time you used an algorithm and it didn’t work how you expected it to. What did you do?" When talking about algorithms in a data science interview it’s useful to present them as tools for solving business problems. It can be tempting to talk about them as mathematical concepts, and although it’s good to show off your understanding, showing how algorithms help solve real-world business problems will be a big plus for your interviewer. Be confident talking about data sources and infrastructure challenges One of the biggest challenges for data scientists is dealing with incomplete or poor quality data. If that’s something you’ve faced - or even if it’s something you think you might face in the future - then make sure you talk about that. Data scientists aren’t always responsible for managing a data infrastructure (that will vary from company to company), but even if that isn’t in the job description, it’s likely that you’ll have to work with a data architect to make sure data is available and accurate to be able to carry our data science projects. This means that understanding topics like data streaming, data lakes and data warehouses is very important in a data science interview. Again, remember that it’s important that you don’t get stuck on the details. You don’t need to recite everything you know, but instead talk about your experience or how you might approach problems in different ways. Data science interview questions you might get asked about using different data sources "How do you work with data from different sources?" "How have you tackled dirty or unreliable data in the past?" Data science interview questions you might get asked about infrastructure "Talk me through a data infrastructure challenge you’ve faced in the past" "What’s the difference between a data lake and data warehouse? How would you approach each one differently?" Show that you have a robust understanding of data science tools You can’t get through a data science interview without demonstrating that you have knowledge and experience of data science tools. It’s likely that the job you’re applying for will mention a number of different skill requirements in the job description, so make sure you have a good knowledge of them all. Obviously, the best case scenario is that you know all the tools mentioned in the job description inside out - but this is unlikely. If you don’t know one - or more - make sure you understand what they’re for and how they work. The hiring manager probably won’t expect candidates to know everything, but they will expect them to be ready and willing to learn. If you can talk about a time you learned a new tool that will give the interviewer a lot of confidence that you’re someone that can pick up knowledge and skills quickly. Show you can evaluate different tools and programming languages Another element here is to be able to talk about the advantages and disadvantages of different tools. Why might you use R over Python? Which Python libraries should you use to solve a specific problem? And when should you just use Excel? Sometimes the interviewer might ask for your own personal preferences. Don’t be scared about giving your opinion - as long as you’ve got a considered explanation for why you hold the opinion that you do, you’re fine! Read next: Why is Python so good for AI and Machine Learning? 5 Python Experts Explain Data science interview questions about tools that you might be asked "What tools have you - or could you - use for data processing and cleaning? What are their benefits and disadvantages?" (These include tools such as Hadoop, Pentaho, Flink, Storm, Kafka.) "What tools do you think are best for data visualization and why?" (This includes tools like Tableau, PowerBI, D3.js, Infogram, Chartblocks - there are so many different products in this space that it’s important that you are able to talk about what you value most about data visualization tools.) "Do you prefer using Python or R? Are there times when you’d use one over another?" "Talk me through machine learning libraries. How do they compare to one another?" (This includes tools like TensorFlow, Keras, and PyTorch. If you don’t have any experience with them, make sure you’re aware of the differences, and talk about which you are most curious about learning.) Always focus on business goals and results This sounds obvious, but it’s so easy to forget. This is especially true if you’re a data geek that loves to talk about statistical models and machine learning. To combat this, make sure you’re very clear on how your experience was tied to business goals. Take some time to think about why you were doing what you were doing. What were you trying to find out? What metrics were you trying to drive? Interpersonal and communication skills Another element to this is talking about your interpersonal skills and your ability to work with a range of different stakeholders. Think carefully about how you worked alongside other teams, how you went about capturing requirements and building solutions for them. Think also about how you managed - or would manage - expectations. It’s well known that business leaders can expect data to be a silver bullet when it comes to results, so how do you make sure that people are realistic. Show off your data science portfolio A good way of showing your business acumen as a data scientist is to build a portfolio of work. Portfolios are typically viewed as something for creative professionals, but they’re becoming increasingly popular in the tech industry as competition for roles gets tougher. This post explains everything you need to build a great data science portfolio. Broadly, the most important thing is that it demonstrates how you have added value to an organization. This could be: Insights you’ve shared in reports with management Building customer-facing applications that rely on data Building internal dashboards and applications Bringing a portfolio to an interview can give you a solid foundation on which you can answer questions. But remember - you might be asked questions about your work, so make sure you have an answer prepared! Data science interview questions about business performance "Talk about a time you have worked across different teams." "How do you manage stakeholder expectations?" "What do you think are the most important elements in communicating data insights to management?" If you can talk fluently about how your work impacts business performance and how you worked alongside others in non-technical positions, you will give yourself a good chance of landing the job! Show that you understand ethical and privacy issues in data science This might seem like a superfluous point but given the events of recent years - like the Cambridge Analytica scandal - ethics has become a big topic of conversation. Employers will expect prospective data scientists to have an awareness of some of these problems and how you can go about mitigating them. To some extent, this is an extension of the previous point. Showing you are aware of ethical issues, such as privacy and discrimination, proves that you are fully engaged with the needs and risks a business might face. It also underlines that you are aware of the consequences and potential impact of data science activities on customers - what your work does in the real-world. Read next: Introducing Deon, a tool for data scientists to add an ethics checklist Data science interview questions about ethics and privacy "What are some of the ethical issues around machine learning and artificial intelligence?" "How can you mitigate any of these issues? What steps would you take?" "Has GDPR impacted the way you do data science?"  "What are some other privacy implications for data scientists?" "How do you understand explainability and interpretability in machine learning?" Ethics is a topic that’s easy to overlook but it’s essential for every data scientist. To get a good grasp of the issues it’s worth investigating more technical content on things like machine learning interpretability, as well as following news and commentary around emergent issues in artificial intelligence. Conclusion: Don’t treat a data science interview like an exam Data science is a complex and multi-faceted field. That can make data science interviews feel like a serious test of your knowledge - and it can be tempting to revise like you would for an exam. But, as we’ve seen, that’s foolish. To ace a data science interview you can’t just recite information and facts. You need to talk clearly and confidently about your experience and demonstrate your drive and curiosity. That doesn’t mean you shouldn’t make sure you know the basics. But rather than getting too hung up on definitions and statistical details, it’s a better use of your time to consider how you have performed your roles in the past, and what you might do in the future. A thoughtful, curious data scientist is immensely valuable. Show your interviewer that you are one.
Read more
  • 0
  • 0
  • 53075

article-image-10-reasons-data-scientists-love-jupyter-notebooks
Aarthi Kumaraswamy
04 Apr 2018
5 min read
Save for later

10 reasons why data scientists love Jupyter notebooks

Aarthi Kumaraswamy
04 Apr 2018
5 min read
In the last twenty years, Python has been increasingly used for scientific computing and data analysis as well. Today, the main advantage of Python and one of the main reasons why it is so popular is that it brings scientific computing features to a general-purpose language that is used in many research areas and industries. This makes the transition from research to production much easier. IPython is a Python library that was originally meant to improve the default interactive console provided by Python and to make it scientist-friendly. In 2011, ten years after the first release of IPython, the IPython Notebook was introduced. This web-based interface to IPython combines code, text, mathematical expressions, inline plots, interactive figures, widgets, graphical interfaces, and other rich media within a standalone sharable web document. This platform provides an ideal gateway to interactive scientific computing and data analysis. IPython has become essential to researchers, engineers, data scientists, teachers and their students. Within a few years, IPython gained an incredible popularity among the scientific and engineering communities. The Notebook started to support more and more programming languages beyond Python. In 2014, the IPython developers announced the Jupyter project, an initiative created to improve the implementation of the Notebook and make it language-agnostic by design. The name of the project reflects the importance of three of the main scientific computing languages supported by the Notebook: Julia, Python, and R. Today, Jupyter is an ecosystem by itself that comprehends several alternative Notebook interfaces (JupyterLab, nteract, Hydrogen, and others), interactive visualization libraries, authoring tools compatible with notebooks. Jupyter has its own conference named JupyterCon. The project received funding from several companies as well as the Alfred P. Sloan Foundation and the Gordon and Betty Moore Foundation. Apart from the rich legacy that Jupyter notebooks come from and the richer ecosystem that it provides developers, here are ten more reasons for you to start using it for your next data science project if aren’t already using it now. All in one place: The Jupyter Notebook is a web-based interactive environment that combines code, rich text, images, videos, animations, mathematical equations, plots, maps, interactive figures and widgets, and graphical user interfaces, into a single document. Easy to share: Notebooks are saved as structured text files (JSON format), which makes them easily shareable. Easy to convert: Jupyter comes with a special tool, nbconvert, which converts notebooks to other formats such as HTML and PDF. Another online tool, nbviewer, allows us to render a publicly-available notebook directly in the browser. Language independent: The architecture of Jupyter is language independent. The decoupling between the client and kernel makes it possible to write kernels in any language. Easy to create kernel wrappers: Jupyter brings a lightweight interface for kernel languages that can be wrapped in Python. Wrapper kernels can implement optional methods, notably for code completion and code inspection. Easy to customize: Jupyter interface can be used to create an entirely customized experience in the Jupyter Notebook (or another client application such as the console). Extensions with custom magic commands: Create IPython extensions with custom magic commands to make interactive computing even easier. Many third-party extensions and magic commands exist, for example, the %%cython magic that allows one to write Cython code directly in a notebook. Stress-free Reproducible experiments: Jupyter notebooks can help you conduct efficient and reproducible interactive computing experiments with ease. It lets you keep a detailed record of your work. Also, the ease of use of the Jupyter Notebook means that you don't have to worry about reproducibility; just do all of your interactive work in notebooks, put them under version control, and commit regularly. Don't forget to refactor your code into independent reusable components. Effective teaching-cum-learning tool: The Jupyter Notebook is not only a tool for scientific research and data analysis but also a great tool for teaching. An example is IPython Blocks - a library that allows you or your students to create grids of colorful blocks. Interactive code and data exploration: The ipywidgets package provides many common user interface controls for exploring code and data interactively. You enjoyed excerpts from Cyrille Rossant’s latest book, IPython Cookbook, Second Edition. This book contains 100+ recipes for high-performance scientific computing and data analysis, from the latest IPython/Jupyter features to the most advanced tricks, to help you write better and faster code. For free recipes from the book, head over to the Ipython Cookbook Github page. If you loved what you saw, support Cyrille’s work by buying a copy of the book today! Related Jupyter articles: Latest Jupyter news updates: Is JupyterLab all set to phase out Jupyter Notebooks? What’s new in Jupyter Notebook 5.3.0 3 ways JupyterLab will revolutionize Interactive Computing Jupyter notebooks tutorials: Getting started with the Jupyter notebook (part 1) Jupyter and Python Scripting Jupyter as a Data Laboratory: Part 1    
Read more
  • 0
  • 0
  • 52676

article-image-how-to-secure-data-in-salesforce-einstein-analytics
Amey Varangaonkar
22 Mar 2018
5 min read
Save for later

How to secure data in Salesforce Einstein Analytics

Amey Varangaonkar
22 Mar 2018
5 min read
[box type="note" align="" class="" width=""]The following excerpt is taken from the book Learning Einstein Analytics written by Santosh Chitalkar. This book includes techniques to build effective dashboards and Business Intelligence metrics to gain useful insights from data.[/box] Before getting into security in Einstein Analytics, it is important to set up your organization, define user types so that it is available to use. In this article we explore key aspects of security in Einstein Analytics. The following are key points to consider for data security in Salesforce: Salesforce admins can restrict access to data by setting up field-level security and object-level security in Salesforce. These settings prevent data flow from loading sensitive Salesforce data into a dataset. Dataset owners can restrict data access by using row-level security. Analytics supports security predicates, a robust row-level security feature that enables you to model many different types of access control on datasets. Analytics also supports sharing inheritance. Take a look at the following diagram: Salesforce data security In Einstein Analytics, dataflows bring the data to the Analytics Cloud from Salesforce. It is important that Einstein Analytics has all the necessary permissions and access to objects as well as fields. If an object or a field is not accessible to Einstein then the data flow fails and it cannot extract data from Salesforce. So we need to make sure that the required access is given to the integration user and security user. We can configure the permission set for these users. Let’s configure permissions for an integration user by performing the following steps: Switch to classic mode and enter Profiles in the Quick Find / Search… box Select and clone the Analytics Cloud Integration User profile and Analytics Cloud Security User profile for the integration user and security user respectively: Save the cloned profiles and then edit them Set the permission to Read for all objects and fields Save the profile and assign it to users Take a look at the following diagram: Data pulled from Salesforce can be made secure from both sides: Salesforce as well as Einstein Analytics. It is important to understand that Salesforce and Einstein Analytics are two independent databases. So, a user security setting given to Einstein will not affect the data in Salesforce. There are the following ways to secure data pulled from Salesforce: Salesforce Security Einstein Analytics Security Roles and profiles Inheritance security Organization-Wide Defaults (OWD) and record ownership Security predicates Sharing rules Application-level security Sharing mechanism in Einstein All Analytics users start off with Viewer access to the default Shared App that’s available out-of-the-box; administrators can change this default setting to restrict or extend access. All other applications created by individual users are private, by default; the application owner and administrators have Manager access and can extend access to other Users, groups, or roles. The following diagram shows how the sharing mechanism works in Einstein Analytics: Here’s a summary of what users can do with Viewer, Editor, and Manager access: Action / Access level Viewer Editor Manager View dashboards, lenses, and datasets in the application. If the underlying dataset is in a different application than a lens or dashboard, the user must have access to both applications to view the lens or dashboard. Yes Yes Yes See who has access to the application. Yes Yes Yes Save contents of the application to another application that the user has Editor or Manager access to. Yes Yes Yes Save changes to existing dashboards, lenses, and datasets in the application (saving dashboards requires the appropriate permission set license and permission). Yes Yes Change the application’s sharing settings. Yes Rename the application. Yes Delete the application. Yes Confidentiality, integrity, and availability together are referred to as the CIA Triad and it is designed to help organizations decide what security policies to implement within the organization. Salesforce knows that keeping information private and restricting access by unauthorized users is essential for business. By sharing the application, we can share a lens, dashboard, and dataset all together with one click. To share the entire application, do the following: Go to your Einstein Analytics and then to Analytics Studio Click on the APPS tab and then the icon for your application that you want to share, as shown in the following screenshot: 3. Click on Share and it will open a new popup window, as shown in the following screenshot: Using this window, you can share the application with an individual user, a group of users, or a particular role. You can define the access level as Viewer, Editor, or Manager After selecting User, click on the user you wish to add and click on Add Save and then close the popup And that’s it. It’s done. Mass-sharing the application Sometimes, we are required to share the application with a wide audience: There are multiple approaches to mass-sharing the Wave application such as by role or by username In Salesforce classic UI, navigate to Setup|Public Groups | New For example, to share a sales application, label a public group as Analytics_Sales_Group Search and add users to a group by Role, Roles and Subordinates, or by Users (username): 5. Search for the Analytics_Sales public group 6. Add the Viewer option as shown in the following screenshot: 7. Click on Save Protecting data from breaches, theft, or from any unauthorized user is very important. And we saw that Einstein Analytics provides the necessary tools to ensure the data is secure. If you found this excerpt useful and want to know more about securing your analytics in Einstein, make sure to check out this book Learning Einstein Analytics.  
Read more
  • 0
  • 0
  • 51523

article-image-create-conversational-assistant-chatbot-using-python
Savia Lobo
21 Feb 2018
5 min read
Save for later

How to create a conversational assistant or chatbot using Python

Savia Lobo
21 Feb 2018
5 min read
[box type="note" align="" class="" width=""]This article is an excerpt taken from a book Natural Language Processing with Python Cookbook written by Krishna Bhavsar, Naresh Kumar, and Pratap Dangeti. This book includes unique recipes to teach various aspects of performing Natural Language Processing with NLTK—the leading Python platform for the task.[/box] Today we will learn to create a conversational assistant or chatbot using Python programming language. Conversational assistants or chatbots are not very new. One of the foremost of this kind is ELIZA, which was created in the early 1960s and is worth exploring. In order to successfully build a conversational engine, it should take care of the following things: 1. Understand the target audience 2. Understand the natural language in which communication happens.  3. Understand the intent of the user 4. Come up with responses that can answer the user and give further clues NLTK has a module, nltk.chat, which simplifies building these engines by providing a generic framework. Let's see the available engines in NLTK: Engines Modules Eliza nltk.chat.eliza Python module Iesha nltk.chat.iesha Python module Rude nltk.chat.rudep ython module Suntsu Suntsu nltk.chat.suntsu module Zen nltk.chat.zen module In order to interact with these engines we can just load these modules in our Python program and invoke the demo() function. This recipe will show us how to use built-in engines and also write our own simple conversational engine using the framework provided by the nltk.chat module. Getting ready You should have Python installed, along with the nltk library. Having an understanding of regular expressions also helps. How to do it...    Open atom editor (or your favorite programming editor).    Create a new file called Conversational.py.    Type the following source code:    Save the file.    Run the program using the Python interpreter.    You will see the following output: How it works... Let's try to understand what we are trying to achieve here. import nltk This instruction imports the nltk library into the current program. def builtinEngines(whichOne): This instruction defines a new function called builtinEngines that takes a string parameter, whichOne: if whichOne == 'eliza': nltk.chat.eliza.demo() elif whichOne == 'iesha': nltk.chat.iesha.demo() elif whichOne == 'rude': nltk.chat.rude.demo() elif whichOne == 'suntsu': nltk.chat.suntsu.demo() elif whichOne == 'zen': nltk.chat.zen.demo() else: print("unknown built-in chat engine {}".format(whichOne)) These if, elif, else instructions are typical branching instructions that decide which chat engine's demo() function is to be invoked depending on the argument that is present in the whichOne variable. When the user passes an unknown engine name, it displays a message to the user that it's not aware of this engine. It's a good practice to handle all known and unknown cases also; it makes our programs more robust in handling unknown situations def myEngine():. This instruction defines a new function called myEngine(); this function does not take any parameters. chatpairs = ( (r"(.*?)Stock price(.*)", ("Today stock price is 100", "I am unable to find out the stock price.")), (r"(.*?)not well(.*)", ("Oh, take care. May be you should visit a doctor", "Did you take some medicine ?")), (r"(.*?)raining(.*)", ("Its monsoon season, what more do you expect ?", "Yes, its good for farmers")), (r"How(.*?)health(.*)", ("I am always healthy.", "I am a program, super healthy!")), (r".*", ("I am good. How are you today ?", "What brings you here ?")) ) This is a single instruction where we are defining a nested tuple data structure and assigning it to chat pairs. Let's pay close attention to the data structure: We are defining a tuple of tuples Each subtuple consists of two elements: The first member is a regular expression (this is the user's question in regex format) The second member of the tuple is another set of tuples (these are the answers) def chat(): print("!"*80) print(" >> my Engine << ") print("Talk to the program using normal english") print("="*80) print("Enter 'quit' when done") chatbot = nltk.chat.util.Chat(chatpairs, nltk.chat.util.reflections) chatbot.converse() We are defining a subfunction called chat()inside the myEngine() function. This is permitted in Python. This chat() function displays some information to the user on the screen and calls the nltk built-in nltk.chat.util.Chat() class with the chatpairs variable. It passes nltk.chat.util.reflections as the second argument. Finally we call the chatbot.converse() function on the object that's created using the chat() class. chat() This instruction calls the chat() function, which shows a prompt on the screen and accepts the user's requests. It shows responses according to the regular expressions that we have built before: if   name    == '  main  ': for engine in ['eliza', 'iesha', 'rude', 'suntsu', 'zen']: print("=== demo of {} ===".format(engine)) builtinEngines(engine) print() myEngine() These instructions will be called when the program is invoked as a standalone program (not using import). They do these two things: Invoke the built-in engines one after another (so that we can experience them) Once all the five built-in engines are excited, they call our myEngine(), where our customer engine comes into play We have learned to create a chatbot of our own using the easiest programming language ‘Python’. To know more about how to efficiently use NLTK and implement text classification, identify parts of speech, tag words, etc check out Natural Language Processing with Python Cookbook.
Read more
  • 0
  • 0
  • 50958
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-cambridge-analytica-ethics-data-science
Richard Gall
20 Mar 2018
5 min read
Save for later

The Cambridge Analytica scandal and ethics in data science

Richard Gall
20 Mar 2018
5 min read
Earlier this month, Stack Overflow published the results of its 2018 developer survey. In it, there was an interesting set of questions around the concept of 'ethical code'. The main takeaway was ultimately that the area remains a gray area. The Cambridge Analytica scandal, however, has given the issue of 'ethical code' a renewed urgency in the last couple of days. The data analytics company are alleged to have not only been involved in votes in the UK and US, but also of harvesting copious amounts of data from Facebook (illegally). For whistleblower Christopher Wylie, the issue of ethical code is particularly pronounced. “I created Steve Bannon’s psychological mindfuck tool” he told Carole Cadwalladr in an interview in the Guardian. Cambridge Analytica: psyops or just market research? Wylie is a data scientist whose experience over the last half a decade or so has been impressive. It’s worth noting however, that Wylie’s career didn’t begin in politics. His academic career was focused primarily on fashion forecasting. That might all seem a little prosaic, but it underlines the fact that data science never happens in a vacuum. Data scientists always operate within a given field. It might be tempting to view the world purely through the prism of impersonal data and cold statistics. To a certain extent you have to if you’re a data scientist or a statistician. But at the very least this can be unhelpful; at worst a potential threat to global democracy. At one point in the interview Wylie remarks that: ...it’s normal for a market research company to amass data on domestic populations. And if you’re working in some country and there’s an auxiliary benefit to a current client with aligned interests, well that’s just a bonus. This is potentially the most frightening thing. Cambridge Analytica’s ostensible role in elections and referenda isn’t actually that remarkable. For all the vested interests and meetings between investors, researchers and entrepreneurs, the scandal is really just the extension of data mining and marketing tactics employed by just about every organization with a digital presence on the planet. Data scientists are always going to be in a difficult position. True, we're not all going to end up working alongside Steve Bannon. But your skills are always being deployed with a very specific end in mind. It’s not always easy to see the effects and impact of your work until later, but it’s still essential for data scientists and analysts to be aware of whose data is being collected and used, how it’s being used and why. Who is responsible for the ethics around data and code? There was another interesting question in the Stack Overflow survey that's relevant to all of this. The survey asked respondents who was ultimately most responsible for code that accomplishes something unethical. 57.5% claimed upper management were responsible, 22.8% said the person who came up with the idea, and 19.7% said it was the responsibility of the developer themselves. Clearly the question is complex. The truth lies somewhere between all three. Management make decisions about what’s required from an organizational perspective, but the engineers themselves are, of course, a part of the wider organizational dynamic. They should be in a position where they are able to communicate any personal misgivings or broader legal issues with the work they are being asked to do. The case of Wylie and Cambridge Analytica is unique, however. But it does highlight that data science can be deployed in ways that are difficult to predict. And without proper channels of escalation and the right degree of transparency it's easy for things to remain secretive, hidden in small meetings, email threads and paper trails. That's another thing that data scientists need to remember. Office politics might be a fact of life, but when you're a data scientist you're sitting on the apex of legal, strategic and political issues. To refuse to be aware of this would be naive. What the Cambridge Analytica story can teach data scientists But there's something else worth noting. This story also illustrates something more about the world in which data scientists are operating. This is a world where traditional infrastructure is being dismantled. This is a world where privatization and outsourcing is viewed as the route towards efficiency and 'value for money'. Whether you think that’s a good or bad thing isn’t really the point here. What’s important is that it makes the way we use data, even the code we write more problematic than ever because it’s not always easy to see how it’s being used. Arguably Wylie was naive. His curiosity and desire to apply his data science skills to intriguing and complex problems led him towards people who knew just how valuable he could be. Wylie has evidently developed greater self-awareness. This is perhaps the main reason why he has come forward with his version of events. But as this saga unfolds it’s worth remembering the value of data scientists in the modern world - for a range of organizations. It’s made the concept of the 'citizen data scientist' take on an even more urgent and literal meaning. Yes data science can help to empower the economy and possibly even toy with democracy. But it can also be used to empower people, improve transparency in politics and business. If anything, the Cambridge Analytica saga proves that data science is a dangerous field - not only the sexiest job of the twenty-first century, but one of the most influential in shaping the kind of world we're going to live in. That's frightening, but it's also pretty exciting.
Read more
  • 0
  • 0
  • 49887

article-image-how-to-build-a-live-interactive-visual-dashboard-in-power-bi-with-azure-stream
Sugandha Lahoti
17 Apr 2018
4 min read
Save for later

How to build a live interactive visual dashboard in Power BI with Azure Stream

Sugandha Lahoti
17 Apr 2018
4 min read
Azure Stream Analytics is a managed complex event processing interactive data engine. As a built-in output connector, it offers the facility of building live interactive intelligent BI charts and graphics using Microsoft's cloud-based Business Intelligent tool called Power BI. In this tutorial we implement a data architecture pipeline by designing a visual dashboard using Microsoft Power BI and Stream Analytics. Prerequisites of building an interactive visual live dashboard in Power BI with Stream Analytics: Azure subscription Power BI Office365 account (the account email ID should be the same for both Azure and Power BI). It can be a work or school account Integrating Power BI as an output job connector for Stream Analytics To start with connecting the Power BI portal as an output of an existing Stream Analytics job, follow the given steps:  First, select Outputs in the Azure portal under JOB TOPOLOGY:  After clicking on Outputs, click on +Add in the top left corner of the job window, as shown in the following screenshot:  After selecting +Add, you will be prompted to enter the New output connectors of the job. Provide details such as job Output name/alias; under Sink, choose Power BI from the drop-down menu.  On choosing Power BI as the streaming job output Sink, it will automatically prompt you to authorize the Power BI work/personal account with Azure. Additionally, you may create a new Power BI account by clicking on Signup. By authorizing, you are granting access to the Stream Analytics output permanently in the Power BI dashboard. You can also revoke the access by changing the password of the Power BI account or deleting the output/job.  Post the successful authorization of the Power BI account with Azure, there will be options to select Group Workspace, which is the Power BI tenant workspace where you may create the particular dataset to configure processed Stream Analytics events. Furthermore, you also need to define the Table Name as data output. Lastly, click on the Create button to integrate the Power BI data connector for real-time data visuals:   Note: If you don't have any custom workspace defined in the Power BI tenant, the default workspace is My Workspace. If you define a dataset and table name that already exists in another Stream Analytics job/output, it will be overwritten. It is also recommended that you just define the dataset and table name under the specific tenant workspace in the job portal and not explicitly create them in Power BI tenants as Stream Analytics automatically creates them once the job starts and output events start to push into the Power BI dashboard.   On starting the Streaming job with output events, the Power BI dataset would appear under the dataset tab following workspace. The  dataset can contain maximum 200,000 rows and supports real-time streaming events and historical BI report visuals as well: Further Power BI dashboard and reports can be implemented using the streaming dataset. Alternatively, you may also create tiles in custom dashboards by selecting CUSTOM STREAMING  DATA under REAL-TIME OATA, as shown in the following screenshot:  By selecting Next, the streaming dataset should be selected and then the visual type, respective fields, Axis, or legends, can be defined: Thus, a complete interactive near real-time Power BI visual dashboard can be implemented with analyzed streamed data from Stream Analytics, as shown in the following screenshot, from the real-world Connected Car-Vehicle Telemetry analytics dashboard: In this article we saw a step-by-step implementation of a real-time visual dashboard using Microsoft Power BI with processed data from Azure Stream Analytics as the output data connector. This article is an excerpt from the book, Stream Analytics with Microsoft Azure, written by Anindita Basak, Krishna Venkataraman, Ryan Murphy, and Manpreet Singh. To learn more on designing and managing Stream Analytics jobs using reference data and utilizing petabyte-scale enterprise data store with Azure Data Lake Store, you may refer to this book. Unlocking the secrets of Microsoft Power BI Ride the third wave of BI with Microsoft Power BI  
Read more
  • 0
  • 0
  • 49665

article-image-what-is-a-support-vector-machine
Packt Editorial Staff
16 Apr 2018
7 min read
Save for later

What is a support vector machine?

Packt Editorial Staff
16 Apr 2018
7 min read
Support vector machines are machine learning algorithms whereby a model 'learns' to categorize data around a linear classifier. The linear classifier is, quite simply, a line that classifies. It's a line that that distinguishes between 2 'types' of data, like positive sentiment and negative language. This gives you control over data, allowing you to easily categorize and manage different data points in a way that's useful too. This tutorial is an extract from Statistics for Machine Learning. But support vector machines do more than linear classification - they are multidimensional algorithms, which is why they're so powerful. Using something called a kernel trick, which we'll look at in more detail later, support vector machines are able to create non-linear boundaries. Essentially they work at constructing a more complex linear classifier, called a hyperplane. Support vector machines work on a range of different types of data, but they are most effective on data sets with very high dimensions relative to the observations, for example: Text classification, in which language has the very dimensions of word vectors For the quality control of DNA sequencing by labeling chromatograms correctly Different types of support vector machines Support vector machines are generally classified into three different groups: Maximum margin classifiers Support vector classifiers Support vector machines Let's take a look at them now. Maximum margin classifiers People often use the term maximum margin classifier interchangeably with support vector machines. They're the most common type of support vector machine, but as you'll see, there are some important differences. The maximum margin classifier tackles the problem of what happens when your data isn't quite clear or clean enough to draw a simple line between two sets - it helps you find the best line, or hyperplane out of a range of options. The objective of the algorithm is to find  furthest distance between the two nearest points in two different categories of data - this is the 'maximum margin', and the hyperplane sits comfortably within it. The hyperplane is defined by this equation: So, this means that any data points that sit directly on the hyperplane have to follow this equation. There are also data points that will, of course, fall either side of this hyperplane. These should follow these equations: You can represent the maximum margin classifier like this: Constraint 2 ensures that observations will be on the correct side of the hyperplane by taking the product of coefficients with x variables and finally, with a class variable indicator. In the diagram below, you can see that we could draw a number of separate hyperplanes to separate the two classes (blue and red). However, the maximum margin classifier attempts to fit the widest slab (maximize the margin between positive and negative hyperplanes) between two classes and the observations touching both the positive and negative hyperplanes. These are the support vectors. It's important to note that in non-separable cases, the maximum margin classifier will not have a separating hyperplane - there's no feasible solution. This issue will be solved with support vector classifiers. Support vector classifiers Support vector classifiers are an extended version of maximum margin classifiers. Here, some violations are 'tolerated' for non-separable cases. This means a best fit can be created. In fact, in real-life scenarios, we hardly find any data with purely separable classes; most classes have a few or more observations in overlapping classes. The mathematical representation of the support vector classifier is as follows, a slight correction to the constraints to accommodate error terms: In constraint 4, the C value is a non-negative tuning parameter to either accommodate more or fewer overall errors in the model. Having a high value of C will lead to a more robust model, whereas a lower value creates the flexible model due to less violation of error terms. In practice, the C value would be a tuning parameter as is usual with all machine learning models. The impact of changing the C value on margins is shown in the two diagrams below. With the high value of C, the model would be more tolerating and also have space for violations (errors) in the left diagram, whereas with the lower value of C, no scope for accepting violations leads to a reduction in margin width. C is a tuning parameter in Support Vector Classifiers: Support vector machines Support vector machines are used when the decision boundary is non-linear. It's useful when it becomes impossible to separate with support vector classifiers. The diagram below explains the non-linearly separable cases for both 1-dimension and 2-dimensions: Clearly, you can't classify using support vector classifiers whatever the cost value is. This is why you would want to then introduce something called the kernel trick. In the diagram below, a polynomial kernel with degree 2 has been applied in transforming the data from 1-dimensional to 2-dimensional data. By doing so, the data becomes linearly separable in higher dimensions. In the left diagram, different classes (red and blue) are plotted on X1 only, whereas after applying degree 2, we now have 2-dimensions, X1 and X21 (the original and a new dimension). The degree of the polynomial kernel is a tuning parameter. You need to tune them with various values to check where higher accuracy might be possible with the model: However, in the 2-dimensional case, the kernel trick is applied as below with the polynomial kernel with degree 2. Observations have been classified successfully using a linear plane after projecting the data into higher dimensions: Different types of kernel functions Kernel functions are the functions that, given the original feature vectors, return the same value as the dot product of its corresponding mapped feature vectors. Kernel functions do not explicitly map the feature vectors to a higher-dimensional space, or calculate the dot product of the mapped vectors. Kernels produce the same value through a different series of operations that can often be computed more efficiently. The main reason for using kernel functions is to eliminate the computational requirement to derive the higher-dimensional vector space from the given basic vector space, so that observations be separated linearly in higher dimensions. Why someone needs to like this is, derived vector space will grow exponentially with the increase in dimensions and it will become almost too difficult to continue computation, even when you have a variable size of 30 or so. The following example shows how the size of the variables grows. Here's an example: When we have two variables such as x and y, with a polynomial degree kernel, it needs to compute x2, y2, and xy dimensions in addition. Whereas, if we have three variables x, y, and z, then we need to calculate the x2, y2, z2, xy, yz, xz, and xyz vector spaces. You will have realized by this time that the increase of one more dimension creates so many combinations. Hence, care needs to be taken to reduce its computational complexity; this is where kernels do wonders. Kernels are defined more formally in the following equation: Polynomial kernels are often used, especially with degree 2. In fact, the inventor of support vector machines, Vladimir N Vapnik, developed using a degree 2 kernel for classifying handwritten digits. Polynomial kernels are given by the following equation: Radial Basis Function kernels (sometimes called Gaussian kernels) are a good first choice for problems requiring nonlinear models. A decision boundary that is a hyperplane in the mapped feature space is similar to a decision boundary that is a hypersphere in the original space. The feature space produced by the Gaussian kernel can have an infinite number of dimensions, a feat that would be impossible otherwise. RBF kernels are represented by the following equation: This is sometimes simplified as the following equation: It is advisable to scale the features when using support vector machines, but it is very important when using the RBF kernel. When the value of the gamma value is small, it gives you a pointed bump in the higher dimensions. A larger value gives you a softer, broader bump. A small gamma will give you low bias and high variance solutions; on the other hand, a high gamma will give you high bias and low variance solutions and that is how you control the fit of the model using RBF kernels: Learn more about support vector machines Support vector machines as a classification engine [read now] 10 machine learning algorithms every engineer needs to know [read now]
Read more
  • 0
  • 0
  • 48969

article-image-facebook-releases-pytorch-1-3-with-named-tensors-pytorch-mobile-8-bit-model-quantization-and-more
Bhagyashree R
11 Oct 2019
5 min read
Save for later

Facebook releases PyTorch 1.3 with named tensors, PyTorch Mobile, 8-bit model quantization, and more

Bhagyashree R
11 Oct 2019
5 min read
Yesterday, at the PyTorch Developer Conference, Facebook announced the release of PyTorch 1.3. This release comes with three experimental features: named tensors, 8-bit model quantization, and PyTorch Mobile. Along with these exciting features, Facebook also announced the general availability of Google Cloud TPU support and a newly launched integration with Alibaba Cloud. Key updates in PyTorch 1.3 Named Tensors for more readable and maintainable code Though tensors are the building blocks of modern machine learning, researchers have argued that they are “broken.” Tensors have their own share of shortcomings: they expose private dimensions, broadcast based on absolute position, and keep the type information in the documentation. PyTorch 1.3 tries to solve this problem by introducing experimental support for named tensors, which was proposed by Sasha Rush, an Associate Professor at Cornell Tech. He has built a library called NamedTensor, which serves as a “thin-wrapper” on Torch tensor. This update introduces a few changes to the API. Dimension access and reduction now use a ‘dim’ argument instead of an index. Constructing and adding dimensions requires a “name” argument. Functions now broadcast based on set operations, not through heuristic ordering rules. 8-bit model quantization for mobile-optimized AI Quantization in deep learning is the method of approximating a neural network that uses 32-bit floating-point numbers by a neural network that uses a lower-precision numerical format. It is used to reduce the bandwidth and compute requirements of deep learning models. This is extremely essential for on-device applications that have limited memory size and number of computations. PyTorch 1.3 brings experimental support for 8-bit model quantization with the eager mode Python API for efficient deployment on servers and edge devices. This feature includes techniques like post-training quantization, dynamic quantization, and quantization-aware training. Moving from 32-bits to 8-bits can result in two to four times faster computations with one-quarter the memory usage. PyTorch Mobile for more efficient on-device machine learning Running machine learning models directly on edge devices is of great importance as it reduces latency. This is why PyTorch 1.3 introduces PyTorch Mobile that enables “an end-to-end workflow from Python to deployment on iOS and Android.” The current release is experimental. In the future releases, we can expect PyTorch Mobile to come with build-level optimization, selective compilation, support for QNNPACK quantized kernel libraries and ARM CPUs, further performance improvements, and more. Model interpretability and privacy tools in PyTorch 1.3 Captum and Captum Insights Captum is an easy-to-use model interpretability library for PyTorch. It is backed by state-of-the-art interpretability algorithms such as Integrated Gradients, DeepLIFT, and Conductance to help developers improve and troubleshoot their models. Developers can identify different features that contribute to a model’s output and improve its design. Facebook has also released an early release of Captum Insights. It is an interpretability visualization widget built on top of Captum. It works across images, text, and other features to help users understand feature attribution. Check out Facebook’s announcement to know more about Captum. CrypTen Machine learning via cloud-based platforms poses various security and privacy challenges. Facebook writes, “In particular, users of these platforms may not want or be able to share unencrypted data, which prevents them from taking full advantage of ML tools.” PyTorch 1.3 comes with CrypTen, a framework for privacy-preserving machine learning. It aims to make secure computing techniques accessible to machine learning practitioners. You can find more about CrypTen on GitHub. Libraries for multimodal AI systems Detectron2: It is an object detection library implemented in PyTorch. It features support for the latest models and tasks and increased flexibility to aid computer vision research. There are also improvements in maintainability and scalability to support production use cases. Fairseq gets speech extensions: With this release, Fairseq, a framework for sequence-to-sequence applications such as language translation includes support for end-to-end learning for speech and audio recognition tasks. The release of PyTorch 1.3 started a discussion on Hacker News and naturally many developers compared it with TensorFlow 2.0. Here’s what a user commented, “This is a common trend for being second in the market when we see Pytorch and TensorFlow 2.0, TF 2.0 was created to compete directly with Pytorch pythonic implementation (Keras based, Eager execution).” They further added, “Facebook at least on PyTorch has been delivering a quality product. Although for us running production pipelines TF is still ahead in many areas (GPU, TPU implementation, TensorRT, TFX and other pipeline tools) I can see Pytorch catching up on the next couple of years which by my prediction many companies will be running serious and advanced workflows and we may be able to see a winner there.” The named tensors implementation is being well-received by the PyTorch community: https://twitter.com/leopd/status/1182342855886376965 https://twitter.com/rasbt/status/1182647527906140161 These were some of the updates in PyTorch 1.3. Check out the official announcement by Facebook to know more. PyTorch 1.2 is here with a new TorchScript API, expanded ONNX export, and more PyTorch announces the availability of PyTorch Hub for improving machine learning research reproducibility Sherin Thomas explains how to build a pipeline in PyTorch for deep learning workflows Facebook AI open-sources PyTorch-BigGraph for faster embeddings in large graphs Facebook open-sources PyText, a PyTorch based NLP modeling framework
Read more
  • 0
  • 0
  • 48856
article-image-understanding-sql-server-recovery-models-to-effectively-backup-and-restore-your-database
Vijin Boricha
11 Apr 2018
9 min read
Save for later

Understanding SQL Server recovery models to effectively backup and restore your database

Vijin Boricha
11 Apr 2018
9 min read
Before you even think about your backups, you need to understand the SQL Server recovery model that is internally used while the database is in operational mode. In this regard, today we will learn about SQL Server recovery models, backup and restore. SQL Server recovery models A recovery model is about maintaining data in the event of a server failure. Also, it defines the amount of information that SQL Server writes to the log file for the purpose of recovery. SQL Server has three database recovery models: Simple recovery model Full recovery model Bulk-logged recovery model Simple recovery model This model is typically used for small databases and scenarios where data changes are infrequent. It is limited to restoring the database to the point when the last backup was created. It means that all changes made after the backup are lost. You will need to recreate all changes manually. The major benefit of this model is that the log file takes only a small amount of storage space. How and when to use it depends on the business scenario. Full recovery model This model is recommended when recovery from damaged storage is the highest priority and data loss should be minimal. SQL Server uses copies of database and log files to restore the database. The database engine logs all changes to the database, including bulk operation and most DDL statements. If the transaction log file is not damaged, SQL Server can recover all data except any transaction which are in process at the time of failure (that is, not committed in to the database file). All logged transactions give you the opportunity of point-in-time recovery, which is a really cool feature. A major limitation of this model is the large size of the log files which leads you to performance and storage issues. Use it only in scenarios where every insert is important and loss of data is not an option. Bulk-logged recovery model This model is somewhere between simple and full. It uses database and log backups to recreate the database. Compared to the full recovery model, it uses less log space for CREATE INDEX and bulk load operations, such as SELECT INTO. Let's look at this example. SELECT INTO can load a table with 1,000,000 records with a single statement. The log will only record the occurrence of these operations but not the details. This approach uses less storage space compared to the full recovery model. The bulk-logged recovery model is good for databases which are used for ETL process and data migrations. SQL Server has a system database model. This database is the template for each new one you create. If you use only the CREATE DATABASE statement without any additional parameters, it simply copies the model database with all the properties and metadata. It also inherits the default recovery model, which is full. So, the conclusion is that each new database will be in full recovery mode. This can be changed during and after the creation process. Here is a SQL statement to check recovery models of all your databases on SQL Server on Linux instance: 1> SELECT name, recovery_model, recovery_model_desc 2> FROM sys.databases 3> GO name recovery_model recovery_model_desc ------------------------ -------------- ------------------- master 3 SIMPLE tempdb 3 SIMPLE model 1 FULL msdb 3 SIMPLE AdventureWorks 3 SIMPLE WideWorldImporters 3 SIMPLE (6 rows affected) The following DDL statement will change the recovery model for the model database from full to simple: 1> USE master 2> ALTER DATABASE model 3> SET RECOVERY SIMPLE 4> GO If you now execute the SELECT statement again to check recovery models, you will notice that model now has different properties. Backup and restore Now it's time for SQL coding and implementing backup/restore operations in our own Environments. First let's create a full database backup of our University database: 1> BACKUP DATABASE University 2> TO DISK = '/var/opt/mssql/data/University.bak' 3> GO Processed 376 pages for database'University', file 'University' on file 1. Processed 7 pages for database 'University', file 'University_log' on file 1. BACKUP DATABASE successfully processed 383 pages in 0.562 seconds (5.324 MB/sec) 2. Now let's check the content of the table Students: 1> USE University 2> GO Changed database context to 'University' 1> SELECT LastName, FirstName 2> FROM Students 3> GO LastName FirstName --------------- ---------- Azemovic Imran Avdic Selver Azemovic Sara Doe John (4 rows affected) 3. As you can see there are four records. Let's now simulate a large import from the AdventureWorks database, Person.Person table. We will adjust the PhoneNumber data to fit our 13 nvarchar characters. But first we will drop unique index UQ_user_name so that we can quickly import a large amount of data. 1> DROP INDEX UQ_user_name 2> ON dbo.Students 3> GO 1> INSERT INTO Students (LastName, FirstName, Email, Phone, UserName) 2> SELECT T1.LastName, T1.FirstName, T2.PhoneNumber, NULL, 'user.name' 3> FROM AdventureWorks.Person.Person AS T1 4> INNER JOIN AdventureWorks.Person.PersonPhone AS T2 5> ON T1.BusinessEntityID = T2.BusinessEntityID 6> WHERE LEN (T2.PhoneNumber) < 13 7> AND LEN (T1.LastName) < 15 AND LEN (T1.FirstName)< 10 8> GO (10661 rows affected) 4. Let's check the new row numbers: 1> SELECT COUNT (*) FROM Students 2> GO ----------- 10665 (1 rows affected) Note: As you see the table now has 10,665 rows (10,661+4). But don't forget that we had created a full database backup before the import procedure. 5. Now, we will create a differential backup of the University database: 1> BACKUP DATABASE University 2> TO DISK = '/var/opt/mssql/data/University-diff.bak' 3> WITH DIFFERENTIAL 4> GO Processed 216 pages for database 'University', file 'University' on file 1. Processed 3 pages for database 'University', file 'University_log' on file 1. BACKUP DATABASE WITH DIFFERENTIAL successfully processed 219 pages in 0.365 seconds (4.676 MB/sec). 6. If you want to see the state of .bak files on the disk, follow this procedure. However, first enter superuser mode with sudo su. This is necessary because a regular user does not have access to the data folder: 7. Now let's test the transaction log backup of University database log file. However, first you will need to make some changes inside the Students table: 1> UPDATE Students 2> SET Phone = 'N/A' 3> WHERE Phone IS NULL 4> GO 1> BACKUP LOG University 2> TO DISK = '/var/opt/mssql/data/University-log.bak' 3> GO Processed 501 pages for database 'University', file 'University_log' on file 1. BACKUP LOG successfully processed 501 pages in 0.620 seconds (6.313 MB/sec) Note: Next steps are to test restore database options of full and differential backup procedures. 8. First, restore the full database backup of University database. Remember that the Students table had four records before the first backup, and it currently has 10,665 (as we checked in step 4): 1> ALTER DATABASE University 2> SET SINGLE_USER WITH ROLLBACK IMMEDIATE 3> RESTORE DATABASE University 4> FROM DISK = '/var/opt/mssql/data/University.bak' 5> WITH REPLACE 6> ALTER DATABASE University SET MULTI_USER 7> GO Nonqualified transactions are being rolled back. Estimated rollback completion: 0%. Nonqualified transactions are being rolled back. Estimated rollback completion: 100%. Processed 376 pages for database 'University', file 'University' on file 1. Processed 7 pages for database 'University', file 'University_log' on file 1. RESTORE DATABASE successfully processed 383 pages in 0.520 seconds (5.754 MB/sec). Note: Before the restore procedure, the database is switched to single user mode.This way we are closing all connections that could abort the restore procedure. In the last step, we are switching the database to multi-user mode again. 9. Let's check the number of rows again. You will see the database is restored to its initial state, before the import of more than 10,000 records from the AdventureWorks database: 1> SELECT COUNT (*) FROM Students 2> GO ------- 4 (1 rows affected) 10. Now it's time to restore the content of the differential backup and return the University database to its state after the import procedure: 1> USE master 2> ALTER DATABASE University 3> SET SINGLE_USER WITH ROLLBACK IMMEDIATE 4> RESTORE DATABASE University 5> FROM DISK = N'/var/opt/mssql/data/University.bak' 6> WITH FILE = 1, NORECOVERY, NOUNLOAD, REPLACE, STATS = 5 7> RESTORE DATABASE University 8> FROM DISK = N'/var/opt/mssql/data/University-diff.bak' 9> WITH FILE = 1, NOUNLOAD, STATS = 5 10> ALTER DATABASE University SET MULTI_USER 11> GO Processed 376 pages for database 'University', file 'University' on file 1. Processed 7 pages for database 'University', file 'University_log' on file 1. RESTORE DATABASE successfully processed 383 pages in 0.529 seconds (5.656 MB/sec). Processed 216 pages for database 'University', file 'University' on file 1. Processed 3 pages for database 'University', file 'University_log' on file 1. RESTORE DATABASE successfully processed 219 pages in 0.309 seconds (5.524 MB/sec). We'll look at a really cool feature of SQL Server: backup compression. A backup can be a very large file, and if companies create backups on daily basis, then you can do the math on the amount of storage required. Disk space is cheap today, but it is not free. As a database administrator on SQL Server on Linux, you should consider any possible option to optimize and save money. Backup compression is just that kind of feature. It provides you with a compression procedure (ZIP, RAR) after creating regular backups. So, you save time, space, and money. Let's consider a full database backup of the University database. The uncompressed file is about 3 MB. After we create a new one with compression, the size should be reduced. The compression ratio mostly depends on data types inside the database. It is not a magic stick but it can save space. The following SQL command will create a full database backup of the University database and compress it: 1> BACKUP DATABASE University 2> TO DISK = '/var/opt/mssql/data/University-compress.bak' 3> WITH NOFORMAT, INIT, SKIP, NOREWIND, NOUNLOAD, COMPRESSION, STATS = 10 4> GO Now exit to bash, enter superuser mode, and type the following ls command to compare the size of the backup files: tumbleweed:/home/dba # ls -lh /var/opt/mssql/data/U*.bak As you can see, the compression size is 676 KB and it is around five times smaller. That is a huge space saving without any additional tools. SQL Server on Linux has one security feature with backup. We learned about SQL Server recovery model, and how to efficiently backup and restore our database. You can know more about transaction logs and elements of backup strategy from this book SQL Server on Linux. Also, check out: Getting to know SQL Server options for disaster recovery Get SQL Server user management right How to integrate SharePoint with SQL Server Reporting Services
Read more
  • 0
  • 0
  • 48511

article-image-backpropagation-algorithm
Packt
08 Jun 2017
11 min read
Save for later

Backpropagation Algorithm

Packt
08 Jun 2017
11 min read
In this article by Gianmario Spacagna, Daniel Slater, Phuong Vo.T.H, and Valentino Zocca, the authors of the book Python Deep Learning, we will learnthe Backpropagation algorithmas it is one of the most important topics for multi-layer feed-forward neural networks. (For more resources related to this topic, see here.) Propagating the error back from last to first layer, hence the name Backpropagation. Backpropagation is one of the most difficult algorithms to understand at first, but all is needed is some knowledge of basic differential calculus and the chain rule. For a deep neural network the algorithm to set the weights is called the Backpropagation algorithm. The Backpropagation algorithm We have seen how neural networks can map inputs onto determined outputs, depending on fixed weights. Once the architecture of the neural network has been defined (feed-forward, number of hidden layers, number of neurons per layer), and once the activity function for each neuron has been chosen, we will need to set the weights that in turn will define the internal states for each neuron in the network. We will see how to do that for a 1-layer network and then how to extend it to a deep feed-forward network. For a deep neural network the algorithm to set the weights is called the Backpropagation algorithm, and we will discuss and explain this algorithm for most of this section as it is one of the most important topics for multilayer feed-forward neural networks. First, however, we will quickly discuss this for 1-layer neural networks. The general concept we need to understand is the following: every neural network is an approximation of a function, therefore each neural network will not be equal to the desired function, instead it will differ by some value. This value is called the error and the aim is to minimize this error. Since the error is a function of the weights in the neural network, we want to minimize the error with respect to the weights. The error function is a function of many weights, it is therefore a function of many variables. Mathematically, the set of points where this function is zero represents therefore a hypersurface and to find a minimum on this surface we want to pick a point and then follow a curve in the direction of the minimum. Linear regression To simplify things we are going to introduce matrix notation. Let x be the input, we can think of x as a vector. In the case of linear regression we are going to consider a single output neuron y, the set of weights w is therefore a vector of dimension the same as the dimension of x. The activation value is then defined as the inner product <x, w>. Let's say that for each input value x we want to output a target value t, while for each x the neural network will output a value y defined by the activity function chosen, in this case the absolute value of the difference (y-t) represents the difference between the predicted value and the actual value for the specific input example x. If we have m input values xi, each of them will have a target value ti. In this case we calculate the error using the mean squared error , where each yi is a function of w. The error is therefore a function of w and it is usually denoted with J(w). As mentioned above, this represents a hypersurface of dimension equal to the dimension of w (we are implicitly also considering the bias), and for each wj we need to find a curve that will lead towards the minimum of the surface. The direction in which a curve increases in a certain direction is given by its derivative with respect to that direction, in this case by: And in order to move towards the minimum we need to move in the opposite direction set by  for each wj. Let's calculate: If , then  and therefore: The notation can sometimes be confusing, especially the first time one sees it. The input is given by vectors xi, where the superscript indicated the ith example. Since x and w are vectors, the subscript indicates the jth coordinate of the vector. yi then represents the output of the neural network given the input xi, while ti represents the target, that is, the desired value corresponding to the input xi. In order to move towards the minimum, we need to move each weight in the direction of its derivative by a small amount l, called the learning rate, typically much smaller than 1, (say 0.1 or smaller). We can therefore drop the 2 in the derivative and incorporate it in the learning rate, to get the update rule therefore given by: or, more in general, we can write the update rule in matrix form as: where ∇ represents the vector of partial derivatives. This process is what is often called gradient descent. One last note, the update can be done after having calculated all the input vectors, however, in some cases, the weights could be updated after each example or after a defined preset number of examples. Logistic regression In logistic regression, the output is not continuous; rather it is defined as a set of classes. In this case, the activation function is not going to be the identity function like before, rather we are going to use the logistic sigmoid function. The logistic sigmoid function, as we have seen before, outputs a real value in (0,1) and therefore it can be interpreted as a probability function, and that is why it can work so well in a 2-class classification problem. In this case, the target can be one of two classes, and the output represents the probability that it be one of those two classes (say t=1).Let’s denote with σ(a), with a the activation value,the logistic sigmoid function, therefore, for each examplex, the probability that the output be the class y, given the weights w, is: We can write that equation more succinctly as: and, since for each sample xi the probabilities are independent, we have that the global probability is: If we take the natural log of the above equation (to turn products into sums), we get: The object is now to maximize this log to obtain the highest probability of predicting the correct results. Usually, this is obtained, as in the previous case, by using gradient descent to minimize the cost function defined by. As before, we calculate the derivative of the cost function with respect to the weights wj to obtain: In general, in case of a multi-class output t, with t a vector (t1, …, tn), we can generalize this equation using J (w) = −log(P( y x,w))= Ei,j ti j, log ( (di)) that brings to the update equation for the weights: This is similar to the update rule we have seen for linear regression. Backpropagation In the case of 1-layer, weight-adjustment was easy, as we could use linear or logistic regression and adjust the weights simultaneously to get a smaller error (minimizing the cost function). For multi-layer neural networks we can use a similar argument for the weights used to connect the last hidden layer to the output layer, as we know what we would like the output layer to be, but we cannot do the same for the hidden layers, as, a priori, we do not know what the values for the neurons in the hidden layers ought to be. What we do, instead, is calculate the error in the last hidden layer and estimate what it would be in the previous layer, propagating the error back from last to first layer, hence the name Backpropagation. Backpropagation is one of the most difficult algorithms to understand at first, but all is needed is some knowledge of basic differential calculus and the chain rule. Let's introduce some notation first. We denote with Jthe cost (error), with y the activity function that is defined on the activation value a (for example y could be the logistic sigmoid), which is a function of the weights w and the input x. Let's also define wi,j the weight between the ith input value and the jth output. Here we define input and output more generically than for 1-layer network, if wi,j connects a pair of successive layers in a feed-forward network, we denote as input the neurons on the first of the two successive layers, and output the neurons on the second of the two successive layers. In order not to make the notation too heavy, and have to denote on which layer each neuron is, we assume that the ith input yi is always in the layer preceding the layer containing the jth output yj The letter y is used to both denote an input and the activity function, and we can easily infer which one we mean by the contest. We also use subscripts i and jwhere we always have ibelonging to the layer preceding the layer containing the element with subscript j. We also use subscripts i and j, where we always have the element with subscript i belonging to the layer preceding the layer containing the element with subscript j. In this example, layer 1 represents the input, and layer 2 the output Using this notation, and the chain-rule for derivatives, for the last layer of our neural network we can write: Since we know that , we have: If y is the logistic sigmoid defined above, we get the same result we have already calculated at the end of the previous section, since we know the cost function and we can calculate all derivatives. For the previous layers the same formula holds: Since we know that and we know that  is the derivative of the activity function that we can calculate, all we need to calculate is the derivative . Let's notice that this is the derivative of the error with respect to the activation function in the second layer, and, if we can calculate this derivative for the last layer, and have a formula that allows us to calculate the derivative for one layer assuming we can calculate the derivative for the next, we can calculate all the derivatives starting from the last layer and move backwards. Let us notice that, as we defined the yj, they are the activation values for the neurons in the second layer, but they are also the activity functions, therefore functions of the activation values in the first layer. Therefore, applying the chain rule, we have: and once again we can calculate both and, so , once we knowwe can calculate, and since we can calculate for the last layer, we can move backward and calculate for any layer and therefore  for any layer. Summarizing, if we have a sequence of layers where: We then have these two fundamental equations, where the summation in the second equation should read as the sum over all the outgoing connections fromyj to any neuron yk in the successive layer: By using these two equations we can calculate the derivatives for the cost with respect to each layer. If we set ,   represents the variation of the cost with respect to the activation value, and we can think of as the error at the neuron yj. We can then rewrite as: which implies that . These two equations give an alternate way of seeing Backpropagation, as the variation of the cost with respect to the activation value, and provide a formula to calculate this variation for any layer once we know the variation for the following layer: We can also combine these equations and show that: The Backpropagation algorithm for updating the weights is then given on each layer by: In the last section we will provide a code example that will help understand and apply these concepts and formulas. Summary At the end of this articlewe learnt the post neural networks architecture phaseand the use of the Backpropagation algorithm and we saw see how we can stack many layers to create and use deep feed-forward neural networks, and how a neural network can have many layers, and why inner (hidden) layers are important. Resources for Article: Further resources on this subject: Basics of Jupyter Notebook and Python [article] Jupyter and Python Scripting [article] Getting Started with Python Packages [article]
Read more
  • 0
  • 0
  • 48284

article-image-how-to-handle-categorical-data-for-machine-learning-algorithms
Packt Editorial Staff
20 Sep 2019
9 min read
Save for later

How to handle categorical data for machine learning algorithms

Packt Editorial Staff
20 Sep 2019
9 min read
The quality of data and the amount of useful information are key factors that determine how well a machine learning algorithm can learn. Therefore, it is absolutely critical that we make sure to encode categorical variables correctly, before we feed data into a machine learning algorithm. In this article, with simple yet effective examples we will explain how to deal with categorical data in computing machine learning algorithms and how we to map ordinal and nominal feature values to integer representations. The article is an excerpt from the book Python Machine Learning - Third Edition by Sebastian Raschka and Vahid Mirjalili. This book is a comprehensive guide to machine learning and deep learning with Python. It acts as both a clear step-by-step tutorial, and a reference you’ll keep coming back to as you build your machine learning systems.  It is not uncommon that real-world datasets contain one or more categorical feature columns. When we are talking about categorical data, we have to further distinguish between nominal and ordinal features. Ordinal features can be understood as categorical values that can be sorted or ordered. For example, t-shirt size would be an ordinal feature, because we can define an order XL > L > M. In contrast, nominal features don't imply any order and, to continue with the previous example, we could think of t-shirt color as a nominal feature since it typically doesn't make sense to say that, for example, red is larger than blue. Categorical data encoding with pandas Before we explore different techniques to handle such categorical data, let's create a new DataFrame to illustrate the problem: >>> import pandas as pd >>> df = pd.DataFrame([ ...            ['green', 'M', 10.1, 'class1'], ...            ['red', 'L', 13.5, 'class2'], ...            ['blue', 'XL', 15.3, 'class1']]) >>> df.columns = ['color', 'size', 'price', 'classlabel'] >>> df color  size price  classlabel 0   green     M 10.1     class1 1     red   L 13.5      class2 2    blue   XL 15.3      class1 As we can see in the preceding output, the newly created DataFrame contains a nominal feature (color), an ordinal feature (size), and a numerical feature (price) column. The class labels (assuming that we created a dataset for a supervised learning task) are stored in the last column. Mapping ordinal features To make sure that the learning algorithm interprets the ordinal features correctly, we need to convert the categorical string values into integers. Unfortunately, there is no convenient function that can automatically derive the correct order of the labels of our size feature, so we have to define the mapping manually. In the following simple example, let's assume that we know the numerical difference between features, for example, XL = L + 1 = M + 2: >>> size_mapping = { ...                 'XL': 3, ...                 'L': 2, ...                 'M': 1} >>> df['size'] = df['size'].map(size_mapping) >>> df color  size price  classlabel 0   green     1 10.1     class1 1     red   2 13.5      class2 2    blue     3 15.3     class1 If we want to transform the integer values back to the original string representation at a later stage, we can simply define a reverse-mapping dictionary inv_size_mapping = {v: k for k, v in size_mapping.items()} that can then be used via the pandas map method on the transformed feature column, similar to the size_mapping dictionary that we used previously. We can use it as follows: >>> inv_size_mapping = {v: k for k, v in size_mapping.items()} >>> df['size'].map(inv_size_mapping) 0   M 1   L 2   XL Name: size, dtype: object Encoding class labels Many machine learning libraries require that class labels are encoded as integer values. Although most estimators for classification in scikit-learn convert class labels to integers internally, it is considered good practice to provide class labels as integer arrays to avoid technical glitches. To encode the class labels, we can use an approach similar to the mapping of ordinal features discussed previously. We need to remember that class labels are not ordinal, and it doesn't matter which integer number we assign to a particular string label. Thus, we can simply enumerate the class labels, starting at 0: >>> import numpy as np >>> class_mapping = {label:idx for idx,label in ...                  enumerate(np.unique(df['classlabel']))} >>> class_mapping {'class1': 0, 'class2': 1} Next, we can use the mapping dictionary to transform the class labels into integers: >>> df['classlabel'] = df['classlabel'].map(class_mapping) >>> df     color  size price  classlabel 0   green     1 10.1         0 1     red   2 13.5           1 2    blue     3 15.3           0 We can reverse the key-value pairs in the mapping dictionary as follows to map the converted class labels back to the original string representation: >>> inv_class_mapping = {v: k for k, v in class_mapping.items()} >>> df['classlabel'] = df['classlabel'].map(inv_class_mapping) >>> df     color  size price  classlabel 0   green     1 10.1     class1 1     red   2 13.5      class2 2    blue     3 15.3     class1 Alternatively, there is a convenient LabelEncoder class directly implemented in scikit-learn to achieve this: >>> from sklearn.preprocessing import LabelEncoder >>> class_le = LabelEncoder() >>> y = class_le.fit_transform(df['classlabel'].values) >>> y array([0, 1, 0]) Note that the fit_transform method is just a shortcut for calling fit and transform separately, and we can use the inverse_transform method to transform the integer class labels back into their original string representation: >>> class_le.inverse_transform(y) array(['class1', 'class2', 'class1'], dtype=object) Performing a technique ‘one-hot encoding’ on nominal features In the Mapping ordinal features section, we used a simple dictionary-mapping approach to convert the ordinal size feature into integers. Since scikit-learn's estimators for classification treat class labels as categorical data that does not imply any order (nominal), we used the convenient LabelEncoder to encode the string labels into integers. It may appear that we could use a similar approach to transform the nominal color column of our dataset, as follows: >>> X = df[['color', 'size', 'price']].values >>> color_le = LabelEncoder() >>> X[:, 0] = color_le.fit_transform(X[:, 0]) >>> X array([[1, 1, 10.1],        [2, 2, 13.5],        [0, 3, 15.3]], dtype=object) After executing the preceding code, the first column of the NumPy array X now holds the new color values, which are encoded as follows: blue = 0 green = 1 red = 2 If we stop at this point and feed the array to our classifier, we will make one of the most common mistakes in dealing with categorical data. Can you spot the problem? Although the color values don't come in any particular order, a learning algorithm will now assume that green is larger than blue, and red is larger than green. Although this assumption is incorrect, the algorithm could still produce useful results. However, those results would not be optimal. A common workaround for this problem is to use a technique called one-hot encoding. The idea behind this approach is to create a new dummy feature for each unique value in the nominal feature column. Here, we would convert the color feature into three new features: blue, green, and red. Binary values can then be used to indicate the particular color of an example; for example, a blue example can be encoded as blue=1, green=0, red=0. To perform this transformation, we can use the OneHotEncoder that is implemented in scikit-learn's preprocessing module: >>> from sklearn.preprocessing import OneHotEncoder >>> X = df[['color', 'size', 'price']].values >>> color_ohe = OneHotEncoder() >>> color_ohe.fit_transform(X[:, 0].reshape(-1, 1)).toarray()  array([[0., 1., 0.],            [0., 0., 1.],            [1., 0., 0.]]) Note that we applied the OneHotEncoder to a single column (X[:, 0].reshape(-1, 1))) only, to avoid modifying the other two columns in the array as well. If we want to selectively transform columns in a multi-feature array, we can use the ColumnTransformer that accepts a list of (name, transformer, column(s)) tuples as follows: >>> from sklearn.compose import ColumnTransformer >>> X = df[['color', 'size', 'price']].values >>> c_transf = ColumnTransformer([ ...     ('onehot', OneHotEncoder(), [0]), ...     ('nothing', 'passthrough', [1, 2]) ... ]) >>> c_transf.fit_transform(X) .astype(float)     array([[0.0, 1.0, 0.0, 1, 10.1],            [0.0, 0.0, 1.0, 2, 13.5],            [1.0, 0.0, 0.0, 3, 15.3]]) In the preceding code example, we specified that we only want to modify the first column and leave the other two columns untouched via the 'passthrough' argument. An even more convenient way to create those dummy features via one-hot encoding is to use the get_dummies method implemented in pandas. Applied to a DataFrame, the get_dummies method will only convert string columns and leave all other columns unchanged: >>> pd.get_dummies(df[['price', 'color', 'size']])     price  size color_blue  color_green color_red 0    10.1     1   0 1          0 1    13.5     2   0 0          1 2    15.3     3   1 0          0 When we are using one-hot encoding datasets, we have to keep in mind that it introduces multicollinearity, which can be an issue for certain methods (for instance, methods that require matrix inversion). If features are highly correlated, matrices are computationally difficult to invert, which can lead to numerically unstable estimates. To reduce the correlation among variables, we can simply remove one feature column from the one-hot encoded array. Note that we do not lose any important information by removing a feature column, though; for example, if we remove the column color_blue, the feature information is still preserved since if we observe color_green=0 and color_red=0, it implies that the observation must be blue. If we use the get_dummies function, we can drop the first column by passing a True argument to the drop_first parameter, as shown in the following code example: >>> pd.get_dummies(df[['price', 'color', 'size']], ...                drop_first=True)     price  size color_green  color_red 0    10.1     1     1 0 1    13.5     2     0 1 2    15.3     3     0 0 In order to drop a redundant column via the OneHotEncoder , we need to set drop='first' and set categories='auto' as follows: >>> color_ohe = OneHotEncoder(categories='auto', drop='first') >>> c_transf = ColumnTransformer([  ...            ('onehot', color_ohe, [0]), ...            ('nothing', 'passthrough', [1, 2]) ... ]) >>> c_transf.fit_transform(X).astype(float) array([[  1. , 0. ,  1. , 10.1],        [  0. ,  1. , 2. ,  13.5],        [  0. ,  0. , 3. ,  15.3]]) In this article, we have gone through some of the methods to deal with categorical data in datasets. We distinguished between nominal and ordinal features, and with examples we explained how they can be handled. To harness the power of the latest Python open source libraries in machine learning check out this book Python Machine Learning - Third Edition, written by Sebastian Raschka and Vahid Mirjalili. Other interesting read in data! The best business intelligence tools 2019: when to use them and how much they cost Introducing Microsoft’s AirSim, an open-source simulator for autonomous vehicles built on Unreal Engine Media manipulation by Deepfakes and cheap fakes require both AI and social fixes, finds a Data & Society report
Read more
  • 0
  • 0
  • 48131
article-image-what-is-lstm
Richard Gall
11 Apr 2018
3 min read
Save for later

What is LSTM?

Richard Gall
11 Apr 2018
3 min read
What does LSTM stand for? LSTM stands for long short term memory. It is a model or architecture that extends the memory of recurrent neural networks. Typically, recurrent neural networks have 'short term memory' in that they use persistent previous information to be used in the current neural network. Essentially, the previous information is used in the present task. That means we do not have a list of all of the previous information available for the neural node. Find out how LSTM works alongside recurrent neural networks. Watch this short video tutorial. How does LSTM work? LSTM introduces long-term memory into recurrent neural networks. It mitigates the vanishing gradient problem, which is where the neural network stops learning because the updates to the various weights within a given neural network become smaller and smaller. It does this by using a series of 'gates'. These are contained in memory blocks which are connected through layers, like this: There are three types of gates within a unit: Input Gate: Scales input to cell (write) Output Gate: Scales output to cell (read) Forget Gate: Scales old cell value (reset) Each gate is like a switch that controls the read/write, thus incorporating the long-term memory function into the model. Applications of LSTM There are a huge range of ways that LSTM can be used, including: Handwriting recognition Time series anomaly detection Speech recognition Learning grammar Composing music The difference between LSTM and GRU There are many similarities between LSTM and GRU (Gated Recurrent Units). However, there are some important differences that are worth remembering: A GRU has two gates, whereas an LSTM has three gates. GRUs don't possess any internal memory that is different from the exposed hidden state. They don't have the output gate, which is present in LSTMs. There is no second nonlinearity applied when computing the output in GRU. GRU as a concept, is a little newer than LSTM. It is generally more efficient - it trains models at a quicker rate than LSTM. It is also easier to use. Any modifications you need to make to a model can be done fairly easily. However, LSTM should perform better than GRU where longer term memory is required. Ultimately, comparing performance is going to depend on the data set you are using. 4 ways to enable Continual learning into Neural Networks Build a generative chatbot using recurrent neural networks (LSTM RNNs) How Deep Neural Networks can improve Speech Recognition and generation
Read more
  • 0
  • 0
  • 48111

article-image-brett-lantz-on-implementing-a-decision-tree-using-c5-0-algorithm-in-r
Packt Editorial Staff
29 Mar 2019
9 min read
Save for later

Brett Lantz on implementing a decision tree using C5.0 algorithm in R

Packt Editorial Staff
29 Mar 2019
9 min read
Decision tree learners are powerful classifiers that utilize a tree structure to model the relationships among the features and the potential outcomes. This structure earned its name due to the fact that it mirrors the way a literal tree begins at a wide trunk and splits into narrower and narrower branches as it is followed upward. In much the same way, a decision tree classifier uses a structure of branching decisions that channel examples into a final predicted class value. In this article, we demonstrate the implementation of decision tree using C5.0 algorithm in R. This article is taken from the book, Machine Learning with R, Fourth Edition written by Brett Lantz. This 10th Anniversary Edition of the classic R data science book is updated to R 4.0.0 with newer and better libraries. This book features several new chapters that reflect the progress of machine learning in the last few years and help you build your data science skills and tackle more challenging problems There are numerous implementations of decision trees, but the most well-known is the C5.0 algorithm. This algorithm was developed by computer scientist J. Ross Quinlan as an improved version of his prior algorithm, C4.5 (C4.5 itself is an improvement over his Iterative Dichotomiser 3 (ID3) algorithm). Although Quinlan markets C5.0 to commercial clients (see http://www.rulequest.com/ for details), the source code for a single-threaded version of the algorithm was made public, and has therefore been incorporated into programs such as R. The C5.0 decision tree algorithm The C5.0 algorithm has become the industry standard for producing decision trees because it does well for most types of problems directly out of the box. Compared to other advanced machine learning models, the decision trees built by C5.0 generally perform nearly as well but are much easier to understand and deploy. Additionally, as shown in the following table, the algorithm's weaknesses are relatively minor and can be largely avoided. Strengths An all-purpose classifier that does well on many types of problems. Highly automatic learning process, which can handle numeric or nominal features, as well as missing data. Excludes unimportant features. Can be used on both small and large datasets. Results in a model that can be interpreted without a mathematical background (for relatively small trees). More efficient than other complex models. Weaknesses Decision tree models are often biased toward splits on features having a large number of levels. It is easy to overfit or underfit the model. Can have trouble modeling some relationships due to reliance on axis-parallel splits. Small changes in training data can result in large changes to decision logic. Large trees can be difficult to interpret and the decisions they make may seem counterintuitive. To keep things simple, our earlier decision tree example ignored the mathematics involved with how a machine would employ a divide and conquer strategy. Let's explore this in more detail to examine how this heuristic works in practice. Choosing the best split The first challenge that a decision tree will face is to identify which feature to split upon. In the previous example, we looked for a way to split the data such that the resulting partitions contained examples primarily of a single class. The degree to which a subset of examples contains only a single class is known as purity, and any subset composed of only a single class is called pure. There are various measurements of purity that can be used to identify the best decision tree splitting candidate. C5.0 uses entropy, a concept borrowed from information theory that quantifies the randomness, or disorder, within a set of class values. Sets with high entropy are very diverse and provide little information about other items that may also belong in the set, as there is no apparent commonality. The decision tree hopes to find splits that reduce entropy, ultimately increasing homogeneity within the groups. Typically, entropy is measured in bits. If there are only two possible classes, entropy values can range from 0 to 1. For n classes, entropy ranges from 0 to log2(n). In each case, the minimum value indicates that the sample is completely homogenous, while the maximum value indicates that the data are as diverse as possible, and no group has even a small plurality. In mathematical notion, entropy is specified as: In this formula, for a given segment of data (S), the term c refers to the number of class levels, and pi  refers to the proportion of values falling into class level i. For example, suppose we have a partition of data with two classes: red (60 percent) and white (40 percent). We can calculate the entropy as: > -0.60 * log2(0.60) - 0.40 * log2(0.40) [1] 0.9709506 We can visualize the entropy for all possible two-class arrangements. If we know the proportion of examples in one class is x, then the proportion in the other class is (1 – x). Using the curve() function, we can then plot the entropy for all possible values of x: > curve(-x * log2(x) - (1 - x) * log2(1 - x),     col = "red", xlab = "x", ylab = "Entropy", lwd = 4) This results in the following figure: The total entropy as the proportion of one class varies in a two-class outcome As illustrated by the peak in entropy at x = 0.50, a 50-50 split results in the maximum entropy. As one class increasingly dominates the other, the entropy reduces to zero. To use entropy to determine the optimal feature to split upon, the algorithm calculates the change in homogeneity that would result from a split on each possible feature, a measure known as information gain. The information gain for a feature F is calculated as the difference between the entropy in the segment before the split (S1) and the partitions resulting from the split (S2): One complication is that after a split, the data is divided into more than one partition. Therefore, the function to calculate Entropy(S2) needs to consider the total entropy across all of the partitions. It does this by weighting each partition's entropy according to the proportion of all records falling into that partition. This can be stated in a formula as: In simple terms, the total entropy resulting from a split is the sum of entropy of each of the n partitions weighted by the proportion of examples falling in the partition (wi). The higher the information gain, the better a feature is at creating homogeneous groups after a split on that feature. If the information gain is zero, there is no reduction in entropy for splitting on this feature. On the other hand, the maximum information gain is equal to the entropy prior to the split. This would imply the entropy after the split is zero, which means that the split results in completely homogeneous groups. The previous formulas assume nominal features, but decision trees use information gain for splitting on numeric features as well. To do so, a common practice is to test various splits that divide the values into groups greater than or less than a threshold. This reduces the numeric feature into a two-level categorical feature that allows information gain to be calculated as usual. The numeric cut point yielding the largest information gain is chosen for the split. Note: Though it is used by C5.0, information gain is not the only splitting criterion that can be used to build decision trees. Other commonly used criteria are Gini index, chi-squared statistic, and gain ratio. For a review of these (and many more) criteria, refer to An Empirical Comparison of Selection Measures for Decision-Tree Induction, Mingers, J, Machine Learning, 1989, Vol. 3, pp. 319-342. Pruning the decision tree As mentioned earlier, a decision tree can continue to grow indefinitely, choosing splitting features and dividing into smaller and smaller partitions until each example is perfectly classified or the algorithm runs out of features to split on. However, if the tree grows overly large, many of the decisions it makes will be overly specific and the model will be overfitted to the training data. The process of pruning a decision tree involves reducing its size such that it generalizes better to unseen data. One solution to this problem is to stop the tree from growing once it reaches a certain number of decisions or when the decision nodes contain only a small number of examples. This is called early stopping or prepruning the decision tree. As the tree avoids doing needless work, this is an appealing strategy. However, one downside to this approach is that there is no way to know whether the tree will miss subtle but important patterns that it would have learned had it grown to a larger size. An alternative, called post-pruning, involves growing a tree that is intentionally too large and pruning leaf nodes to reduce the size of the tree to a more appropriate level. This is often a more effective approach than prepruning because it is quite difficult to determine the optimal depth of a decision tree without growing it first. Pruning the tree later on allows the algorithm to be certain that all of the important data structures were discovered. Note: The implementation details of pruning operations are very technical and beyond the scope of this book. For a comparison of some of the available methods, see A Comparative Analysis of Methods for Pruning Decision Trees, Esposito, F, Malerba, D, Semeraro, G, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997, Vol. 19, pp. 476-491. One of the benefits of the C5.0 algorithm is that it is opinionated about pruning—it takes care of many of the decisions automatically using fairly reasonable defaults. Its overall strategy is to post-prune the tree. It first grows a large tree that overfits the training data. Later, the nodes and branches that have little effect on the classification errors are removed. In some cases, entire branches are moved further up the tree or replaced by simpler decisions. These processes of grafting branches are known as subtree raising and subtree replacement, respectively. Getting the right balance of overfitting and underfitting is a bit of an art, but if model accuracy is vital, it may be worth investing some time with various pruning options to see if it improves the test dataset performance. To summarize , decision trees are widely used due to their high accuracy and ability to formulate a statistical model in plain language.  Here, we looked at a highly popular and easily configurable decision tree algorithm C5.0. The major strength of the C5.0 algorithm over other decision tree implementations is that it is very easy to adjust the training options. Harness the power of R to build flexible, effective, and transparent machine learning models with Brett Lantz’s latest book Machine Learning with R, Fourth Edition. Dr.Brandon explains Decision Trees to Jon Building a classification system with Decision Trees in Apache Spark 2.0 Implementing Decision Trees
Read more
  • 0
  • 0
  • 48067
Modal Close icon
Modal Close icon