Data | 0 articles | Tech News, Tutorials & Expert Insights

article-image-understanding-the-fundamentals-of-analytics-teams-with-john-k-thompson

06 Apr 2021

6 min read

Understanding the Fundamentals of Analytics Teams with John K. Thompson

06 Apr 2021

Key-takeaways: Data scientists need a tailored portfolio of projects that they own and manage to have a sense of autonomy. The top skill or personality trait a successful data scientist can possess (and should possess) is curiosity. Managing a successful analytics team and individual analytics professionals is different than managing any other type of team. Data and analytics will be ubiquitous in the very near future. Analytics teams are different than any other team in the organization and analytics professionals are unique variant of creative professionals. Providing challenging, interesting and valuable work in the form of a personal project portfolio of work for a data scientist can be done and needs to be done to ensure productivity, job satisfaction, value delivery, and retention. We interviewed Analytics Leader, and bestselling author, John K Thompson on data analytics, the future of analytics and his recent book, Building Analytics Teams. The interview in detail: 1. What are the fundamental concepts of building and managing a high-performing analytics team? It is critically important to remember that data scientists are creative and intelligent people. They cannot be managed well in a command-and-control environment. Data scientists need a tailored portfolio of projects that they own and manage to have a sense of autonomy. If they have a portfolio of projects and can manage their time and effort, the productivity of the team will be much higher than what is typically seen in teams managed in a traditional manner. The relationship of the analytics leader with their peers and executives of the company is critically important to the success of the analytics team. It is very important to realize that most analytics project fail at the point of where analytical models are to be implemented in production systems. 2. Tell us about your book, Building Analytics Teams. How is your book new and/or different from other books on Data Analytics?  Building Analytics Teams is focused on the practical challenges faced by people who are building and managing high performance analytics teams and the staff members who make up those analytics teams. The book is different from other books in that it examines the process of building and managing a team from a holistic view. The book considers the organization framework, the required processes, the people, the projects, the problems, and pitfalls. The content of the book guides the reader through how to navigate these challenges and provides illustrations and examples of how to be successful. The book is a “how to” guide on how to successfully manage the analytics process in a large corporate environment. 3. What was the motivation behind writing this book? I have not seen a book like this, and I wish I had a book like this earlier in my career. I have built a number of analytics teams. While building and growing those teams, I noticed certain recurring patterns. I wanted to address the misconceptions and the misperceptions people hold about analytics teams. Analytics teams are unique. The team members who are successful have a different mindset and attitude toward project work and team work. I wanted to communicate the differences inherent in a high-performance analytics team when compared to other teams. Also, I wanted to communicate that managing a successful analytics team and individual analytics professionals is different than managing any other type of team. I wanted to write a guide for managers and analytics professionals to help them understand how the broader organization views them and how they can interface and interact with their peers in related organizational functions to increase the probability of joint success. 4. What should be the starting point for data analytics enthusiasts aiming to begin their journey in Data Analytics? How do you think your book will help them in their journey? It depends on where they are starting their journey. If they are in the process of completing their undergraduate or graduate studies, I would suggest that they take classes in programming, data science or analytics. If they are professionals, I would suggest that they take classes on Coursera, Udemy or any other on-line educational platform to see if they have a real interest in, and affinity for, analytics. If they do have an interest, then they should start working on analytics for themselves to test out analytical techniques, apply critical thinking and try to understand what they can see or cannot see in the data. If that works out and their interest remains, they should volunteer for projects at work that will enable them to work with data and analytics in a work setting. If they have the education, the affinity and the skill, then apply for a data science position. Grab some data and make a difference! 5. What are the key skills required for someone to be successful working in Data Analytics? What are the pain points/challenges one should know? The top skill or personality trait a successful data scientist can possess (and should possess) is curiosity. Without curiosity, you will find it difficult to be successful as a data scientist. It helps to be talented and well educated, but I have met many stellar data scientists that are neither. Beyond those traits, it is more important to be diligent and persistent. The most successful business analysts and data scientists I have ever worked with were all naturally and perpetually curious and had a level of diligence and persistence that was impressive. As for pain points and challenges; data scientists need to work on improving their listening skills, their written & verbal communication and presentation skills. All data scientists need improvement in these areas. 6. What is the future of analytics? What will we see next? I do believe that we are entering an era where data and analytics will be increasing in importance in all human endeavors. Certainly, corporate use of data and analytics will increase in importance, hence the focus of the book. But beyond corporations, the active and engaged use of data and analytics will increase in importance and daily use in managing multiple aspects of - people’s personal lives, academic pursuits, governmental policy, military operations, humanitarian aid, tailoring of products and services; building of roads, towns and cities, planning of traffic patterns, provisioning of local federal and state services, intergovernmental relationships and more. There will not be an element of human endeavor that will not be touched and changed by data and analytics. Data is ubiquitous today and data and analytics will be ubiquitous in the very near future. We will see more discussions on who owns data and who should be able to monetize data. We will experience increasing levels of AI and analytics across all systems that we interact with, and most of it will be unnoticed and operate in the background for our benefit. About: John K. Thompson is an international technology executive with over 30 years of experience in the business intelligence and advanced analytics fields. Currently, John is responsible for the global Advanced Analytics and Artificial Intelligence team and efforts at CSL Behring.

0
0
23686

article-image-understand-quickbooks-online-desktop-online-security-use-cases-and-more-with-crystalynn-shelton-a-certified-quickbooks-proadvisor

Vincy Davis

27 Dec 2019

8 min read

Understand Quickbooks online/desktop, online security, use cases, and more with Crystalynn Shelton, a certified QuickBooks ProAdvisor

Vincy Davis

27 Dec 2019

8 min read

Quickbooks, the accounting software package developed and marketed by Intuit is targeted towards small and medium-sized businesses. It offers on-premises accounting applications and cloud-based versions that can undertake remote access capabilities like remote payroll assistance and outsourcing, electronic payment functions, online banking, and reconciliation, mapping features, and more. To know more about Quickbooks’ latest features and its learning curve for beginners, we did a quick interview with Crystalynn Shelton, a certified QuickBooks ProAdvisor and author of the book ‘Mastering QuickBooks 2020’. With more than 10 years of experience in Quickbooks, Shelton says Quickbooks is not only user-friendly but also cost-effective. Further, when asked about her views on QuickBooks online, Shelton points out that its live unlimited technical support is one of its main features On Quickbooks, its benefits and use cases What are some of the advantages of Quickbooks that sets it apart from its competitors? QuickBooks has a number of advantages that set them apart from its competitors. First, it is affordable for most small businesses. Whether you purchase an Online subscription (starting at $20/month) or a desktop product (starting at a one-time fee of $199), there is something for every budget. Another benefit of using QuickBooks is the program is very user-friendly. Most small business owners purchase the software and are able to set it up without having an IT person on staff. In addition, there are a number of training videos, an extensive help menu within the program not to mention live tech support if you need it. Because QuickBooks is the most widely used accounting software program used by small businesses, most accountants and CPAs are familiar with the program. Some of these folks are certified ProAdvisors (like myself). They can offer consulting, training, and even bookkeeping services to small business owners who use QuickBooks. Can you elaborate on how small businesses can take benefit from Quickbooks? Also, how does Quickbooks simplifies tasks for them? While there are numerous reasons why small businesses decide to use QuickBooks, there are five that tend to be the most common reasons: small businesses who can’t afford to hire a bookkeeper, small businesses who have outgrown the use of Excel spreadsheets and need a more sophisticated way to track income and expenses, small businesses who need financial statements in order to apply for a line of credit or business loan, small businesses whose tax professional will no longer accept a shoebox of receipts to file taxes. QuickBooks simplifies bookkeeping by allowing you to track all aspects of the business in one place: accounts payable, accounts receivable, income, and expenses. It uses simple language such as “people who owe you” (aka accounts receivable) or “what you owe to others” (aka accounts payable) to help business owners without prior bookkeeping knowledge comprehend the program. QuickBooks allows you to accept credit card payments from customers so you can get paid faster and easily reconcile payments to open invoices. Not to mention you can reduce (if not eliminate) manual data entry by connecting all of your business bank and credit card accounts to QBO. Can you elaborate on how your book ‘Mastering QuickBooks 2020’ will prepare bookkeepers and accounting students in learning the ropes of QuickBooks? Also, how does the learning curve look like for users who have no bookkeeping knowledge and no experience with QuickBooks? This book was written with the assumption that the reader has no experience or knowledge of bookkeeping. We use simple language to explain how QuickBooks works and we have also provided screenshots to support the concepts being taught. Chapter 1 includes a section that covers bookkeeping basics which will help non-accountants gain a better understanding of the terminology used in the field of accounting as well as QuickBooks. This information will help aspiring accountants build on their existing bookkeeping knowledge. In addition, we have included the behind the scenes debits and credits for certain transactions to help accounting students prepare for the CPA exam or other academic tests. Shelton’s views on QuickBooks Online and Desktop What are your thoughts on QuickBooks Online and Quickbooks Desktop? What are the benefits of cloud accounting over Desktop? Do factors such as the size of an organization, or its maturity matter in choosing between the online and the desktop version? There are several benefits of using cloud accounting software over the desktop. Cloud accounting software allows you to manage your business from any device with an internet connection; whereas desktop limits you to a desktop computer. With cloud accounting software like QuickBooks Online, you can give anyone access to your QuickBooks data without them having to travel to your office. Cloud accounting software includes automatic real-time updates of your data. Unlike desktop software, you don’t have to worry about backing up your data with Online; its automatically done for you. Finally, QuickBooks Online includes unlimited live technical support. This is an invaluable feature for small business owners who are managing their own books and need the ability to get help when they need it. The size of an organization, structure, and length of time in business can definitely impact whether a business should choose QuickBooks Online or desktop. As a QuickBooks ProAdvisor, one of the first things I do is conduct an assessment to determine what the needs of my clients are. This involves documenting the details of their current processes (i.e. invoicing customers, paying bills, managing inventory, etc.) Once I have this information, I am able to determine whether QuickBooks desktop is right or if QuickBooks Online is the best fit. If both products are ideal, I provide my clients with the downsides (if any) of going with one product over the other. This gives my clients all of the information they need to make an informed decision. On how Quickbook secures online data How does Quickbooks help in securing payments? How does QuickBooks keep online data safe? To secure payments, intuit transmits, support, protect, and access all cardholder information in compliance with the Payment Card Industry’s (PCI) data security standards. Additional security precautions Intuit has implemented are as follows: All data between Intuit servers and their customers is encrypted with at least 128-bit TLS, and all copies of daily backup data are encrypted with 256-bit AES encryption. Data is kept secure with multiple servers housed in Tier-3 data centers that have strict access controls and real-time video monitoring of the data center. All servers are hardened Linux installations, which are monitored in real-time and kept up-to-date with security patches. Can you suggest some best practices (at least five) that will help Quickbook aspirants in saving time and becoming a Quickbook pro? There are several ways you can save time and become proficient in QuickBooks Online. First, I recommend that you use QuickBooks on a daily basis. The more hands-on experience with QuickBooks, the more proficient you will become. Second, take the time to properly set up your QuickBooks account before you start entering transactions. In Chapter 2, we provide you with a detailed checklist which includes what information you need to setup QuickBooks. By taking the time to set up customers, vendors, the chart of accounts, and your products and services upfront, the less time you will spend having to do it later on when you are trying to enter data. Third, all aspiring bookkeepers and accountants should get certified in QuickBooks Online. Certification is offered through Intuit and it is free. As a Certified QuickBooks ProAdvisor, you get access to product discounts, marketing materials to promote bookkeeping services to prospective clients, a certification badge and designation you can put on business cards, websites and email signature lines. Fourth, utilize keyboard shortcuts. They will save you time as you navigate the program. We have included a list of QBO keyboard shortcuts in the appendix of this book. Finally, connect as many bank and credit card accounts as you can to QBO. By doing so, you will reduce the amount of manual data entry required which will help you to keep your books up-to-date. If you want to learn how to build the perfect budget, simplify tax return preparation, manage inventory, track job costs, generate income statements and financial reports, check out Crystalynn’s book ‘Mastering QuickBooks 2020’. This book will work for a small business owner, bookkeeper, or accounting student who wants to learn how to make the most of QuickBooks Online. About the author Crystalynn Shelton is a licensed Certified Public Accountant, a certified QuickBooks ProAdvisor and has been certified in QuickBooks for more than 10 years. Crystalynn is currently a staff writer for Fit Small Business and an Adjunct Instructor at UCLA Extension where she teaches accounting, bookkeeping and QuickBooks to hundreds of small business owners and accounting students each year. Her previous experience includes working at Intuit (QuickBooks) as a Sr. Learning Specialist. MongoDB’s CTO Eliot Horowitz on what’s new in MongoDB 4.2, Ops Manager, Atlas, and more New QGIS 3D capabilities and future plans presented by Martin Dobias, a core QGIS developer Greg Walters on PyTorch and real-world implementations and future potential of GANs Elastic marks its entry in security analytics market with Elastic SIEM and Endgame acquisition “The challenge in Deep Learning is to sustain the current pace of innovation”, explains Ivan Vasilev, machine learning engineer

0
0
22810

article-image-greg-walters-on-pytorch-and-real-world-implementations-and-future-potential-of-gans

Vincy Davis

13 Dec 2019

10 min read

Greg Walters on PyTorch and real-world implementations and future potential of GANs

Vincy Davis

13 Dec 2019

10 min read

Introduced in 2014, GANs (Generative Adversarial Networks) was first presented by Ian Goodfellow and other researchers at the University of Montreal. It comprises of two deep networks, the generator which generates data instances, and the discriminator which evaluates the data for authenticity. GANs works not only as a form of generative model for unsupervised learning, but also has proved useful for semi-supervised learning, fully supervised learning, and reinforcement learning. In this article, we are in conversation with Greg Walters, one of the authors of the book 'Hands-On Generative Adversarial Networks with PyTorch 1.x', where we discuss some of the real-world applications of GANs. According to Greg, facial recognition and age progression will one of the areas where GANs will shine in the future. He believes that with time GANs will soon be visible in more real-world applications, as with GANs the possibilities are unlimited. On why PyTorch for building GANs Why choose PyTorch for GANs? Is PyTorch better than other popular frameworks like Tensorflow? Both PyTorch and Tensorflow are good products. Tensorflow is based on code from Google and PyTorch is based on code from Facebook. I think that PyTorch is more pythonic and (in my opinion) is easier to learn. Tensorflow is two years older than PyTorch, which gives it a bit of an edge, and does have a few advantages over PyTorch like visualization and deploying trained models to the web. However, one of the biggest advantages that PyTorch has is the ability to handle distributed training. It’s much easier when using PyTorch. I’m sure that both groups are looking at trying to lessen the gaps that exist and that we will see big changes in both. Refer to Chapter 4 of my book to learn how to use PyTorch to train a GAN model. Have you had a chance to explore the recently released PyTorch 1.3 version? What are your thoughts on the experimental feature - named tensors? How do you think it will help developers in getting a more readable and maintainable code? What are your thoughts on other features like PyTorch Mobile and 8-bit model quantization for mobile-optimized AI? The book was originally written to introduce PyTorch 1.0 but quickly evolved to work with PyTorch 1.3.x. Things are moving very quickly for PyTorch, so it presents an evermoving target. Named tensors are very exciting to me. I haven’t had a chance to spend a tremendous amount of time on them yet, but I plan to continue working with them and explore them deeply. I believe that they will help make some of the concepts of manipulating tensors much easier for beginners to understand and read and understand the code created by others. This will help create more novel and useful GANs for the future. The same can be said for PyTorch Mobile. Expanding capabilities to more (and less expensive) processor types like ARM creates more opportunities for programmers and companies that don’t have the high-end capabilities. Consider the possibilities of running a heavy-duty AI on a $35 Raspberry Pi. The possibilities are endless. With PyTorch Mobile, both Android and iOS devices can benefit from the new advances in image recognition and other AI programs. The 8-bit model quantization allows tensor operations to be done using integers rather than floating-point values, allowing models to be more compact. I can’t begin to speculate on what this will bring us in the way of applications in the future. You can read Chapter 2 of my book to know more about the new features in PyTorch 1.3. On challenges and real-world applications of GANs GANs have found some very interesting implementations in the past year like a deepfake that can animate your face with just your voice, a neural GAN to fight fake news, a CycleGAN to visualize the effects of climate change, and more. Most of the GAN implementations are built for experimentation or research purposes. Do you think GANs can soon translate to solve real-world problems? What do you think are the current challenge that restrict GANs from being implemented in real-world scenarios? Yes. I do believe that we will see GANs starting to move to more real-world applications. Remember that in the grand scheme of things, GANs are still fairly new. 2014 wasn’t that long ago. We will see things start to pop in 2020 and move forward from there. As to the current challenges, I think that it’s simply a matter of getting the word out. Many people who are conversant with Machine Learning still haven’t heard of GANs, mainly due to the fact that they are so busy with what they know and are comfortable with, so they haven’t had the time and/or energy to explore GANs yet. That will change. Of course, things change on almost a daily basis, so who can guess where we will be in another two years? Some of the existing and future applications that GANs can help implement include new photo-realistic scenes for video games, movies, and television, taking sketches from designers and making realistic photographs in both the fashion industry and architecture, taking a partial facial image and making a rotated view for better facial recognition, age progression and regression and so much more. Pretty much anything with a pattern, be it image or text can be manipulated using GANs. There are a variety of GANs available out there. How should one approach them in terms of problem solving? What are the other possible ways to group GANs? That’s a very hard question to answer. You are correct, there are a large number of GANs in “the wild” and some work better for some things than others. That was one of the big challenges of writing the book. Add to that, new GANs are coming out all the time that continue to get better and better and extend the possibility matrix. The best suggestion that I could make here is to use the resources of the Internet and read, read and read. Try one or two to see what works best for your application. Also, create your own category list that you create based on your research. Continue to refine the categories as you go. Then share your findings so others can benefit from what you’ve learned. New GANs implementations and future potential In your book, 'Hands-On Generative Adversarial Networks with PyTorch 1.x', you have demonstrated how GANs can be used in image restoration problems, such as super-resolution image reconstruction and image inpainting. How do SRGAN help in improving the resolution of images and performing image inpainting? What other deep learning models can be used to address image restoration problems? What are other keep image related problems where GANs are useful and relevant? Well, that is sort of like asking “how long is a piece of string”. Picture a painting in a museum that has been damaged from fire or over time. Right now, we have to rely on very highly trained experts who spend hundreds of hours to bring the painting back to its original glory. However, it’s still an approximation of what the expert THINKS the original was to be. With things like SRGAN, we can see old photos “restored” to what they were originally. We already can see colorized versions of some black and white classic films and television shows. The possibilities are endless. Image restoration is not limited to GANs, but at the moment seems to be one of the most widely used methods. Fairly new methods like ARGAN (Artifact Reduction GAN) and FD-GAN (Face De-Morphing GAN or Feature Distilling GAN) are showing a lot of promise. By the time I’m finished with this interview, there could be three or more others that will surpass these. ARGAN is similar and can work with SRGAN to aid in image reconstruction. FD-GAN can be used to work with human position images, creating different poses from a totally different pose. This has any number of possibilities from simple fashion shots too, again, photo-realistic images for games, movies and television shows. Find more about image restoration from Chapter 7 of my book. GANs are labeled as innovative due to its ability to generate fake data that looks real. The latest developments in GANs allows it to generate high-dimensional fake data or image video that can easily go undetected. What is your take on the ethical issues surrounding GANs? Don’t you think developers should target creating GANs that will be good for humanity rather than developing scary AI capabilities? Good question. However, the same question has been asked about almost every advance in technology since rainbows were in black and white. Take, for example, the discussion in Chapter 6 where we use CycleGAN to create van Gogh like images. As I was running the code we present, I was constantly amazed by how well the Generator kept coming up with better fakes that looked more and more like they were done by the Master. Yes, there is always the potential for using the technology for “wrong” purposes. That has always been the case. We already have AI that can create images that can fool talent scouts and fake news stories. J. Hector Fezandie said back in 1894, "with great power comes great responsibility" and was repeated by Peter Parker’s Uncle Ben thanks to Stan Lee. It was very true then and is still just as true. How do you think GANs will be contributing to AI innovations in the future? Are you expecting/excited to see an implementation of GANs in a particular area/domain in the coming years? 5 years ago, GANs were pretty much unknown and were only in the very early stages of reality. At that point, no one knew the multitude of directions that GANs would head towards. I can’t begin to imagine where GANs will take us in the next two years, much let the far future. I can’t imagine any area that wouldn’t benefit from the use of GANs. One of the subjects we wanted to cover was facial recognition and age progression, but we couldn’t get permission to use the dataset. It’s a shame, but that will be one of the areas that GANs will shine in for the future. Things like biomedical research could be one area that might really be helped by GANs. I hate to keep using this phrase, but the possibilities are unlimited. If you want to learn how to build, train, and optimize next-generation GAN models and use them to solve a variety of real-world problems, read Greg’s book ‘Hands-On Generative Adversarial Networks with PyTorch 1.x’. This book highlights all the key improvements in GANs over generative models and will help guide you to make the GANs with the help of hands-on examples. What are generative adversarial networks (GANs) and how do they work? [Video] Generative Adversarial Networks: Generate images using Keras GAN [Tutorial] What you need to know about Generative Adversarial Networks ICLR 2019 Highlights: Algorithmic fairness, AI for social good, climate change, protein structures, GAN magic, adversarial ML and much more Interpretation of Functional APIs in Deep Neural Networks by Rowel Atienza

0
0
22788

article-image-industrial-internet-iiot-architects

Aaron Lazar

21 Nov 2017

8 min read

Why the Industrial Internet of Things (IIoT) needs Architects

Aaron Lazar

21 Nov 2017

8 min read

The Industrial Internet, the IIoT, the 4th Industrial Revolution or Industry 4.0, whatever you may call it, has gained a lot of traction in recent times. Many leading companies are driving this revolution, connecting smart edge devices to cloud-based analysis platforms and solving their business challenges in new and smarter ways. To ensure the smooth integration of such machines and devices, effective architectural strategies based on accepted principles, best practices, and lessons learned, must be applied. In this interview, Shyam throws light on his new book, Architecting the Industrial Internet, and shares expert insights into the world of IIoT, Big Data, Artificial Intelligence and more. Shyam Nath Shyam is the director of technology integrations for Industrial IoT at GE Digital. His area of focus is building go-to-market solutions. His technical expertise lies in big data and analytics architecture and solutions with focus on IoT. He joined GE in Sep 2013 prior to which he has worked in IBM, Deloitte, Oracle, and Halliburton. He is the Founder/President of the BIWA Group, a global community of professional in Big Data, analytics, and IoT. He has often been listed as one of the top social media influencers for Industrial IoT. You can follow him on Twitter @ShyamVaran. He talks about the IIoT, the various impacts that technologies like AI and Deep Learning will have on IIoT and he gives a futuristic direction to where IIoT is headed towards. He talks about the challenges that Architects face while architecting IIoT solutions and how his book will help them overcome such issues. Key Takeaways The fourth Industrial Revolution will break silos and bring IT and Ops teams together to function more smoothly. Choosing the right technology to work with involves taking risks and experimenting with custom solutions. The Predix platform and Predix.io allow developers and architects, quickly learn from others and build working prototypes that can be used to get quick feedback from the business users. Interoperability issues and a lack of understanding of all the security ramifications of the hyper-connected world could be a few challenges that adoption of IIoT must overcome Supporting technologies like AI, Deep Learning, AR and VR will have major impacts on the Industrial Internet In-depth Interview On the promise of a future with the Industrial Internet The 4th Industrial Revolution is evolving at a terrific pace. Can you highlight some of the most notable aspects of Industry 4.0? The Industrial Internet is the 4th Industrial Revolution. It will have a profound impact on both the industrial productivity as well as the future of work. Due to more reliable power, cleaner water, and Intelligent Cities, the standard of living will improve, at large for the world citizens. Industrial Internet will forge new collaborations between the IT and OT, in the organizations, and each side will develop a better appreciation of the problems and technologies of the other. They will work together to create smoother overall operations by breaking the silos. On Shyam’s IIoT toolbox that he uses on a day to day basis You have a solid track record of architecting IIoT applications in the Big Data space over the years. What tools do you use on a day-to-day basis? In order to build Industrial Internet applications, GE's Predix is my preferred IIoT platform. It is built for Digital Industrial solutions, with security and compliance baked into it. Customer IIoT solutions can be quickly built on Predix and extended with the services in the marketplace from the ecosystem. For Asset Health Monitoring and for reducing the downtime, Asset Performance Management (APM) can be used to get a jump start and its extensibility framework can be used to extend it. On how to begin one’s journey into building the Industry 4.0 For an IIoT architect, what would your recommended learning plan be? What aspects of architecting Industry 4.0 applications are tricky to master and how does your book Architecting the Industrial Internet, prepare its readers to be industry ready? An IIoT Architect can start with the book Architecting the Industrial Internet, to get a good grasp of the area broadly. This book provides a diverse set of perspectives and architectural principles, from authors who work in GE Digital, Oracle and Microsoft. The end-to-end IIoT applications involve an understanding of sensors, machines, control systems, connectivity and cloud or server systems, along with the understanding of associated enterprise data, the architect needs to focus on a limited solution or proof of concept first. The book provides coverage for the end-to-end requirements of the IIoT solutions for the architects, developers and business managers. The extensive set of use cases and case studies provides examples from many different industry domains to allow the readers to easily related to it. The book is written, in a style that would not overwhelm the reader, yet explain the workings of the architecture and the solutions. The book will be best suited for Enterprise Architects and Data Architects who are trying to understand how IIoT solutions differ from traditional IT solutions. The layer-by-layer description of the IIoT Architecture will provide a systematic approach to help develop a deep understanding, for Architects. IoT Developers who have some understanding of this area can learn the IIoT platform-based approach to building solutions quickly. On how to choose the best technology solution to optimize ROI There are so many IIoT technologies, that manufacturers are confused as to how to choose the best technology to obtain the best ROI. What would your advice to manufacturers be, in this regard? The manufacturers and operation leaders look for quick solutions to known issues, in a proven way. Hence, often they do not have the appetite to experiment with a custom solution, rather they like to know where the solution provider has solved similar problems and what was the outcome. The collection of use cases and case studies will help business leaders get an idea of the potential ROI while evaluating the solution. Getting to know Predix, GE’s IIoT platform, better Let's talk a bit about Predix, GE's IIoT platform. What advantages does Predix offer developers and architects? Do you foresee any major improvements coming to Predix in the near future? The GE's Predix platform has a growing developer community that is approaching 40,000 strong. Likewise, the ecosystem of Partners is approaching 1000. Coupled with the free access to create developer accounts on Predix.io, the developers and architects can quickly learn from others and build working prototypes that can be used to get quick feedback from the business users. The catalog of microservices at Predix.io will continue to expand. Likewise, applications written on top of Predix, such as APM and OPM (Operations Performance Management) will continue to become feature-rich, providing coverage to many common Digital Industrial challenges. On the impact of other emerging technologies like AI on IIoT What according to you will the impact be of AI and Deep Learning, on IIoT? AI and Deep Learning help to build robust Digtal Twins of the industrial assets. These Digital Twins, will make the job of predictive maintenance and optimization, much easier for the operators of these assets. Further, IIoT will benefit from many new advances in technologies like AI, AR/VR and make the job of Field Services Technicians easier. IIoT is already widely used in energy generation and distribution, Intelligent Cities for law enforcement and to ease traffic congestion. The field of healthcare is evolving, due to increasing use of wearables. Finally, precision agriculture is enabled by IoT as well. On likely barriers to IIoT adoption What are the roadblocks you expect in the adoption of IIoT? Today the challenges to rapid adoption of IoT, are interoperability issues and lack of understanding of all the security ramifications of the hyper-connected world. Finally, how to explain the business case of the IoT to the decision makers and different stakeholders is still evolving. On why Architecting the Industrial Internet is a must read for Architects Would you like to give architects 3 reasons on why they should pick up your book? It is written by IIoT practitioners from large companies who are building solutions for both internal and external consumption. The book captures the architectural best practices and advocates a platform based approach, to solutions. The theory is put to practice in the form of use cases and case studies, to provide a comprehensive guide to the architects. If you enjoyed this interview, do check out Shyam’s latest book, Architecting the Industrial Internet.

0
0
22467

article-image-prof-rowel-atienza-discusses-the-intuition-behind-deep-learning-techniques-advances-in-gans

Packt Editorial Staff

30 Sep 2019

6 min read

Prof. Rowel Atienza discusses the intuition behind deep learning, advances in GANs & techniques to create cutting-edge AI models

Packt Editorial Staff

30 Sep 2019

6 min read

In recent years, deep learning has made unprecedented progress in vision, speech, natural language processing and understanding, and other areas of data science. Developments in deep learning techniques, including GANs, variational autoencoders and deep reinforcement learning, are creating impressive AI results. For example, DeepMind's AlphaGo Zero became a game changer in AI research when it beat world champions in the game of Go. In this interview, Professor Rowel Atienza, author of the book Advanced Deep Learning with Keras talks about the recent developments in the field of deep learning. This book is a comprehensive guide to the advanced deep learning techniques available today, so you can create your own cutting-edge AI. This book strikes a balance between advanced concepts in deep learning and practical implementations with Keras. Key takeaways from the interview The intuition of deep learning is built on the fact that the deeper the network gets, the more feature representations the network learns in order to solve complex real-world problems. The objective of deep learning is to enable agents to be more robust to unforeseen events and to lessen the dependency on huge data. Advances in GANs enable us to generate high-dimensional fake data such as high-resolution images or videos that look very convincing. Deep learning tackles the curse of dimensionality by finding efficient data structures and layers that could represent complex data in the most efficient manner. The interview in detail What is the intuition behind deep learning? What are the recent developments in deep learning? Rowel Atienza: Deep learning is built on the intuition that the deeper the network gets, the more feature representations the network learns in order to solve complex real-world problems. Unlike machine learning, deep learning learns these features automatically from data in different degrees of supervision. There are many recent developments in deep learning. There are advances on graph neural networks because people are realizing the limits of NLP (Natural Language Processing), CNN (Convolution Neural Networks), and RNN (Recurrent Neural Networks) in representing more complex data structures such as social network, 3D shapes, molecular structures, etc. Implementing the causality in reasoning on data is another area of strong interest. Deep learning is strong on correlation not on discovering causality in data. Meta learning or learning to learn is also another area of interest. The objective is to enable agents to be more robust to unforeseen events and to lessen the dependency on huge data. What are different deep learning techniques to create successful AI? RA: A successful AI is dependent on two things: 1) deep domain knowledge and 2) deep understanding of state of the art techniques that will work on the domain problem. Domain knowledge comes from someone who is very familiar with the domain problem. This person is not necessarily knowledgeable in AI. This domain knowledge is then modelled in AI to automate the process of problem solving. How deep learning tackles the curse of dimensionality? RA: One of the goals of deep learning is to keep on finding efficient data structures and layers that could represent complex data in the most efficient manner. For example, geometric deep learning is able to circumvent the limitations of representing and learning from 3D data by avoiding inefficient 3D convolutions. There is still so much to be done in this space. What is autoencoders? What is the need of autoencoders in deep learning? How do you create an autoencoder? RA: Autoencoders compress high dimensionality data into low dimensionality code without losing important information. Low-dimensional code is suitable for further processing by other deep learning models such as in generative models like GANs and VAEs. Autoencoder can easily be implemented using two networks, an encoder and decoder. The depth, width, and type of layers are dependent on the original data to be encoded. Why are GANs so innovative? RA: GANs are innovative since they are good in generating fake data that look real. It is something that is hard to accomplish using other generative models. The advances in GANs enable us to generate high-dimensional fake data such as high resolution image or video that look very convincing. Tell us a little bit about this book? What makes this book necessary? What gap does it fill? RA: Advanced Deep Learning with Keras focuses on recent advances on deep learning It starts with a quick review of deep learning concepts (NLP, CNN, RNN). The discussions on deep neural networks, autoencoders, generative adversarial network (GAN), variational autoencoders (VAE), and deep reinforcement learning (DRL) follow. The book is important for everyone who would like to understand advanced concepts on deep learning and their corresponding implementation in Keras. The current version has in depth focus on generative models (autoencoders, GANs, VAEs) that could be used in-practical setting. The DRL explains the core concepts of value-based and policy-based methods in reinforcement learning and the corresponding working implementations in Keras which are difficult to make them right. About the Book Advanced Deep Learning with Keras is a comprehensive guide to the advanced deep learning techniques available today, so you can create your own cutting-edge AI. Using Keras as an open-source deep learning library, you'll find hands-on projects throughout that show you how to create more effective AI with the latest techniques. About the Author Rowel Atienza is an Associate Professor at the Electrical and Electronics Engineering Institute of the University of the Philippines, Diliman. He holds the Dado and Maria Banatao Institute Professorial Chair in Artificial Intelligence. Rowel has been fascinated with intelligent robots since he graduated from the University of the Philippines. He received his MEng from the National University of Singapore for his work on an AI-enhanced four-legged robot. He finished his Ph.D. at The Australian National University for his contribution to the field of active gaze tracking for human-robot interaction. Deep learning models have massive carbon footprints, can photonic chips help reduce power consumption? Machine learning experts on how we can use machine learning to mitigate and adapt to the changing climate Google launches beta version of Deep Learning Containers for developing, testing and deploying ML applications

0
0
22219

article-image-why-choose-ibm-spss-statistics-r

Amey Varangaonkar

22 Dec 2017

9 min read

Why choose IBM SPSS Statistics over R for your data analysis project

Amey Varangaonkar

22 Dec 2017

9 min read

Data analysis plays a vital role in organizations today. It enables effective decision-making by addressing fundamental business questions based on the understanding of the available data. While there are tons of open source and enterprise tools for conducting data analysis, IBM SPSS Statistics has emerged as a popular tool among statistical analysts and researchers. It offers them the perfect platform to quickly perform data exploration and analysis, and share their findings with ease. [author title=""] Dr. Kenneth Stehlik-Barry Kenneth joined SPSS as Manager of Training in 1980 after using SPSS for his own research for several years. He has used SPSS extensively to analyze and discover valuable patterns that can be used to address pertinent business issues. He received his PhD in Political Science from Northwestern University and currently teaches in the Masters of Science in Predictive Analytics program there. Anthony J. Babinec Anthony joined SPSS as a Statistician in 1978 after assisting Norman Nie, the founder of SPSS, at the University of Chicago. Anthony has led a business development effort to find products implementing technologies such as CHAID decision trees and neural networks. Anthony received his BA and MA in Sociology with a specialization in Advanced Statistics from the University of Chicago and is on the Board of Directors of the Chicago Chapter of the American Statistical Association, where he has served in different positions including the President. [/author] In this interview, we take a look at the world of statistical data analysis and see how IBM SPSS Statistics makes it easier to derive business sense from data. Kenneth and Anthony also walk us through their recently published book - Data Analysis with IBM SPSS Statistics - and tell us how it benefits aspiring data analysts and statistical researchers. Key Takeaways - IBM SPSS Statistics IBM SPSS Statistics is a key offering of IBM Analytics - providing an integrated interface for statistical analysis on-premise and on the cloud SPSS Statistics is a self-sufficient tool - it does not require you to have any knowledge of SQL or any other scripting language SPSS Statistics helps you avoid the 3 most common pitfalls in data analysis, i.e. handling missing data, choosing the best statistical method for analysis and understanding the results of the analysis R and Python are not direct competitors to SPSS Statistics - instead, you can create customized solutions by integrating SPSS Statistics with these tools for effective analyses and visualization Data Analysis with IBM SPSS Statistics highlights various popular statistical techniques to the readers, and how to use them in order to gather useful hidden insights from their data Full Interview IBM SPSS Statistics is a popular tool for efficient statistical analysis. What do you think are the 3 notable features of SPSS Statistics that make it stand apart from the other tools available out there? SPSS Statistics has a very short learning curve which makes it ideal for analysts to use efficiently. It also has a very comprehensive set of statistical capabilities so virtually everything a researcher would ever need is encompassed in a single application. Finally, SPSS Statistics provides a wealth of features for preparing and managing data so it is not necessary to master SQL or another database language to address data-related tasks. With over 20 years of experience in this field, you have a solid understanding of the subject and, equally, of SPSS Statistics. How do you use the tool in your work? How does it simplify your day to day tasks related to data analysis? I have used SPSS Statistics in my work with SPSS and IBM clients over the years. In addition, I use SPSS for my own research analysis. It allows me to make good use of my time whether I'm serving clients or doing my own analysis because of the breadth of capabilities available within this one program. The fact that SPSS produces presentation-ready output further simplifies things for me since I can collect key results as I work and put them into a draft report and share them as required. What are the prerequisites to use SPSS Statistics effectively? For someone who intends to use SPSS Statistics for their data analysis tasks, how steep is the curve when it comes to mastering the tool? It certainly helps to have a understanding of basic statistics when you begin to use SPSS Statistics but it can be a valuable tool even with a limited background in statistics. The learning curve is a very "gentle slope" when it comes to acquiring sufficient familiarity with SPSS Statistics to use it very effectively. Mastering the software does involve more time and effort but one can accomplish this over time as one builds on the initial knowledge that comes fairly easily. The good news is that one can obtain a lot of value from the software well before one truly masters it by discovering the many features. What are some of the common problems in data analysis? How does this book help the readers overcome them? Some of the most common pitfalls encountered when analyzing data involve handling missing/incomplete data, deciding which statistical method(s) to employ and understanding the results. In the book, we go into the details of detecting and addressing data issues including missing data. We also describe what each statistical technique provides and when it is most appropriate to use each of them. There are numerous examples of SPSS Statistics output and how the results can be used to assess whether a meaningful pattern exists. In the context of all the above, how does your book Data Analysis with IBM SPSS Statistics help readers in their statistical analysis journey? What, according to you, are the 3 key takeaways for the readers from this book? The approach we took with our book was to share with readers the most straightforward ways to use SPSS Statistics to quickly obtain the results needed to effectively conduct data analysis. We did this by showing the best way to proceed when it comes to analyzing data and then showing how this process can be done best in the software. The key takeaways from our book are the way to approach the discovery process when analyzing data, how to find hidden patterns present in the data and what to look for in the results provided by the statistical techniques covered in the book. IBM SPSS Statistics 25 was released recently. What are the major improvements or features introduced in this version? How do these features help the analysts and researchers? There are a lot of interesting new features introduced in SPSS Statistics 25. For starters, you can copy charts as Microsoft Graphic Objects, which allows you to manipulate charts in Microsoft Office. There are changes to the chart editor that make it easier to customize colors, borders, and grid line settings in charts. Most importantly, it allows the implementation of Bayesian statistical methods. Bayesian statistical methods enable the researcher to incorporate prior knowledge and assumptions about model parameters. This facility looks like a good teaching tool for Statistical Educators. Data visualization goes a long way in helping decision-makers get an accurate sense of their data. How does SPSS Statistics help them in this regard? Kenneth: Data visualization is very helpful when it comes to communicating findings to a broader audience and we spend time in the book describing when and how to create useful graphics to use for this purpose. Graphical examination of the data can also provide clues regarding data issues and hidden patterns that warrant deeper exploration. These topics are also covered in the book. Anthony: SPSS Statistics’ data visualizations capabilities are excellent. The menu system makes it easy to generate common chart types. You can develop customized looks and save them as a template to be applied to future charts. Underlying SPSS Graphics is an influential approach called the Grammar of Graphics. The SPSS graphics capabilities are embodied in a versatile syntax called Graphics Programming Language. Do you foresee SPSS Statistics facing stiff competition from open source alternatives in the near future? What is the current sentiment in the SPSS community regarding these topics? Kenneth: Open source tools based alternatives such as Python and R are potential competition for SPSS Statistics but I would argue otherwise. These tools, while powerful, have a much steeper learning curve and will prove difficult for subject matter experts that periodically need to analyze data. SPSS is ideally suited for these periodic analysts whose main expertise lies in their field which could be healthcare, law enforcement, education, human resources, marketing, etc. Anthony: The open source programs have a lot of capability but they are also fairly low-level languages, so you must learn to code. The learning curve is steep, and there are many maintainability issues. R has 2 major releases a year. You can have a situation where the data and commands remain the same, but the result changes when you update R. There are many dependencies among R packages. R has many contributors and is an avenue for getting your hands on new methods. However, there is a wide variance in the quality of the contributors and contributed packages. The occasional user of SPSS has an easier time jumping back in than does the occasional user of open source software. Most importantly, it is easier to employ SPSS in production settings. SPSS Statistics supports custom analytical solutions through integration with R and Python. Is this an intent from IBM to join hands with the open source community? This is a good follow-up question to the one asked before. Actually, the integration with R and Python allows SPSS Statistics to be extended to accommodate a situation in which an analyst wishes to try an algorithm or graphical technique not directly available in the software but which is supported in one of these languages. It also allows those familiar with R or Python to use SPSS Statistics as their platform and take advantage of all the built-in features it comes with, out of the box while still having the option to employ these other languages where they provide additional value. Lastly, this book is designed for analysts and researchers who want to get meaningful insights from their data as quickly as possible. How does this book help them in this regard? SPSS Statistics does make it possible to very quickly pull in data and get insightful results. This book is designed to streamline the steps involved in getting this done while also pointing out some of the less obvious "hidden gems" that we have discovered during the decades of using SPSS in virtually every possible situation.

0
0
22148

article-image-interview-tirthajyoti-sarkar-and-shubhadeep-roychowdhury-data-wrangling-with-python

Sugandha Lahoti

25 Oct 2018

7 min read

“Data is the new oil but it has to be refined through a complex processing network” - Tirthajyoti Sarkar and Shubhadeep Roychowdhury [Interview]

Sugandha Lahoti

25 Oct 2018

7 min read

Data is the new oil and is just as crude as unrefined oil. To do anything meaningful - modeling, visualization, machine learning, for predictive analysis – you first need to wrestle and wrangle with data. We recently interviewed Dr. Tirthajyoti Sarkar and Shubhadeep Roychowdhury, the authors of the course Data Wrangling with Python. They talked about their new course and discuss why do data wrangling and why use Python to do it. Key Takeaways Python boasts of a large, broad library equipped with a rich set of modules and functions, which you can use to your advantage and manipulate complex data structures NumPy, the Python library for fast numeric array computations and Pandas, a package with fast, flexible, and expressive data structures are helpful in working with “relational” or “labeled” data. Web scraping or data extraction becomes easy and intuitive with Python libraries, such as BeautifulSoup4 and html5lib. Regex, the tiny, highly specialized programming language inside Python can create patterns that help match, locate, and manage text for large data analysis and searching operations Present interesting, interactive visuals of your data with Matplotlib, the most popular graphing and data visualization module for Python. Easily and quickly separate information from a huge amount of random data using Pandas, the preferred Python tool for data wrangling and modeling. Full Interview Congratulations on your new course ‘Data wrangling with Python’. What this course is all about? Data science is the ‘sexiest job’ of 21st century’ (at least until Skynet takes over the world). But for all the emphasis on ‘Data’, it is the ‘Science’ that makes you - the practitioner - valuable. To practice high-quality science with data, first you need to make sure it is properly sourced, cleaned, formatted, and pre-processed. This course teaches you the most essential basics of this invaluable component of the data science pipeline – data wrangling. What is data wrangling and why should you learn it well? “Data is the new Oil” and it is ruling the modern way of life through incredibly smart tools and transformative technologies. But oil from the rig is far from being usable. It has to be refined through a complex processing network. Similarly, data needs to be curated, massaged and refined to become fit for use in intelligent algorithms and consumer products. This is called “wrangling” and (according to CrowdFlower) all good data scientists spend almost 60-80% of their time on this, each day, every project. It generally involves the following: Scraping the raw data from multiple sources (including web and database tables), Inputing, formatting, transforming – basically making it ready for use in the modeling process (e.g. advanced machine learning), Handling missing data gracefully, Detecting outliers, and Being able to perform quick visualizations (plotting) and basic statistical analysis to judge the quality of your formatted data This course aims to teach you all the core ideas behind this process and to equip you with the knowledge of the most popular tools and techniques in the domain. As the programming framework, we have chosen Python, the most widely used language for data science. We work through real-life examples and at the end of this course, you will be confident to handle a myriad array of sources to extract, clean, transform, and format your data for further analysis or exciting machine learning model building. Walk us through your thinking behind how you went about designing this course. What’s the flow like? How do you teach data wrangling in this course? The lessons start with a refresher on Python focusing mainly on advanced data structures, and then quickly jumping into NumPy and Panda libraries as fundamental tools for data wrangling. It emphasizes why you should stay away from traditional ways of data cleaning, as done in other languages, and take advantage of specialized pre-built routines in Python. Thereafter, it covers how using the same Python backend, one can extract and transform data from a diverse array of sources - internet, large database vaults, or Excel financial tables. Further lessons teach how to handle missing or wrong data, and reformat based on the requirement from a downstream analytics tool. The course emphasizes learning by real example and showcases the power of an inquisitive and imaginative mind primed for success. What other tools are out there? Why do data wrangling with Python? First, let us be clear that there is no substitute for the data wrangling process itself. There is no short-cut either. Data wrangling must be performed before the modeling task but there is always the debate of doing this process using an enterprise tool or by directly using a programming language and associated frameworks. There are many commercial, enterprise-level tools for data formatting and pre-processing, which does not involve coding on the part of the user. Common examples of such tools are: General purpose data analysis platforms such as Microsoft Excel (with add-ins) Statistical discovery package such as JMP (from SAS) Modeling platforms such as RapidMiner Analytics platforms from niche players focusing on data wrangling such as – Trifacta, Paxata, Alteryx At the end of the day, it really depends on the organizational approach whether to use any of these off-the-shelf tools or to have more flexibility, control, and power by using a programming language like Python to perform data wrangling. As the volume, velocity, and variety (three V’s of Big Data) of data undergo rapid changes, it is always a good idea to develop and nurture significant amount of in-house expertise in data wrangling. This is done using fundamental programming frameworks so that an organization is not betrothed to the whims and fancies of any particular enterprise platform as a basic task as data wrangling. Some of the obvious advantages of using an open-source, free programming paradigm like Python for data wrangling are: General purpose open-source paradigm putting no restriction on any of the methods you can develop for the specific problem at hand Great eco-system of fast, optimized, open-source libraries, focused on data analytics Growing support to connect Python for every conceivable data source types, Easy interface to basic statistical testing and quick visualization libraries to check data quality Seamless interface of the data wrangling output to advanced machine learning models – Python is the most popular language of choice of machine learning/artificial intelligence these days. What are some best practices to perform data wrangling with Python? Here are five best practices that will help you out in your data wrangling journey with Python. And in the end, all you’ll have is clean and ready to use data for your business needs. Learn the data structures in Python really well Learn and practice file and OS handling in Python Have a solid understanding of core data types and capabilities of Numpy and Pandas Build a good understanding of basic statistical tests and a panache for visualization Apart from Python, if you want to master one language, go for SQL What are some misconceptions about data wrangling? Though data wrangling is an important task, there are certain myths associated with data wrangling which developers should be cautious of. Myths such as: Data wrangling is all about writing SQL query Knowledge of statistics is not required for data wrangling You have to be a machine learning expert to do great data wrangling Deep knowledge of programming is not required for data wrangling Learn in detail about these misconceptions. We hope that these misconceptions would help you realize that data wrangling is not as difficult as it seems. Have fun wrangling data! About the authors Dr. Tirthajyoti Sarkar works as a Sr. Principal Engineer in the semiconductor technology domain where he applies cutting-edge data science/machine learning techniques for design automation and predictive analytics. Shubhadeep Roychowdhury works as a Sr. Software Engineer at a Paris based Cyber Security startup. He holds a Master Degree in Computer Science from West Bengal University Of Technology and certifications in Machine Learning from Stanford. 5 best practices to perform data wrangling with Python 4 misconceptions about data wrangling Data cleaning is the worst part of data analysis, say data scientists

0
0
22101

article-image-statistics-data-science-interview-james-miller

Amey Varangaonkar

09 Jan 2018

9 min read

Why You Need to Know Statistics To Be a Good Data Scientist

Amey Varangaonkar

09 Jan 2018

9 min read

Data Science has popularly been dubbed as the sexiest job of the 21st century. So much so that everyone wants to become a data scientist. But what do you need to get started with data science? Do you need to have a degree in statistics? Why is having sound knowledge of statistics so important to be a good data scientist? We seek answers to these questions and look at data science through a statistical lens, in an interesting conversation with James D. Miller. [author title="James D. Miller"]James is an IBM certified expert and a creative innovator. He has over 35 years of experience in applications and system design & development across multiple platforms and technologies. Jim has also been responsible for managing and directing multiple resources in various management roles including project and team leader, lead developer and applications development director. He is the author or several popular books such as Big Data Visualization, Learning IBM Watson Analytics, Mastering Splunk, and many more. In addition, Jim has written a number of whitepapers and continues to write on a number of relevant topics based upon his personal experiences and industry best practices.[/author] In this interview, we look at some of the key challenges faced by many while transitioning from a data developer role to a data scientist. Jim talks about his new book, Statistics for Data Science and discusses how statistics plays a key role when it comes to finding unique, actionable insights from data in order to make crucial business decisions. Key Takeaways - Statistics for Data Science Data science attempts to uncover the hidden context of data by going beyond answering generic questions such as ‘what is happening’, to tackling questions such as ‘what should be done next’. Statistics for data science cultivates 'structured thinking' in one. For most data developers who are transitioning to the role of data scientist, the biggest challenge often comes in calibrating their thought process - from being data design-driven to more insight-driven Having a sound knowledge of statistics differentiates good data scientists from mediocre ones - it helps them accurately identify patterns in data that can potentially cause changes in outcomes Statistics for Data Science attempts to bridge the learning gap between database development and data science by implementing the statistical concepts and methodologies in R to build intuitive and accurate data models. These methodologies and their implementations are easily transferable to other popular programming languages such as Python. While many data science tasks are being automated these days using different tools and platforms, the statistical concepts and methodologies will continue to form their backbone. Investing in statistics for data science is worth every penny! Full Interview Everyone wants to learn data science today as it is one of the most in-demand skills out there. In order to be a good data scientist, having a strong foundation in statistics has become a necessity. Why do you think is this the case? What importance does statistics have in data science? With Statistics, it has always been about "explaining" (data). With data science, the objective is going beyond questions such as "what happened?" and the "what is happening?" to try to determine "what should be done next?". Understanding the fundamentals of statistics allows one to apply "structured thinking" to interpret knowledge and insights sourced from statistics. You are a seasoned professional in the field of Data Science with over 30 years of experience. We would like to know how your journey in Data Science began, and what changes you have observed in this domain over the 3 decades. I have been fortunate to have had a career that has traversed many platforms and technological trends (in fact over 37 years of diversified projects). Starting as a business applications and database developer, I have almost always worked for the office of finance. Typically, these experiences started with the collection - and then management of - data to be able to report results or assess performance. Over time, the industry has evolved and this work as becoming a “commodity” – with many mature tool options available and plenty of seasoned professionals available to perform the work. Businesses have now become keen to “do something more” with their data assets and are looking to move into the world of data science. The world before us offers enormous opportunities for those not only with a statistical background but someone with a business background that understands and can apply the statistical data sciences to identify new opportunities or competitive advantages. What are the key challenges involved in the transition from being a data developer to becoming a data scientist? How does the knowledge of statistics affect this transition? Does one need a degree in statistics before jumping into Data Science? Someone who has been working actively with data already has a “head start” in that they have experience with managing and manipulating data and data sources. They would also most likely have programming experience and possess the ability to apply logic to data. The challenge will be to “retool” their thinking from data developer to data scientist – for example, going from data querying to data mining. Happily, there is much that the data developer “already knows” about data science and my book Statistics for Data Science attempts to “point out” the skills and experiences that the data developer will recognize as the same or at least have significant similarities. You will find that the field of data science is still evolving and the definition of “data scientist” depends upon the industry, project or organization you are referring to. This means that there are many roles that may involve data science with each having perhaps quite different prerequisites (such as a statistical degree). You have authored a lot of books such as Big Data Visualization, Learning IBM Watson Analytics, etc. with the latest being Statistics for Data Science. Please tell us something about your latest book. The latest book, “Statistics for Data Science”, looks to point out the synergies between a data developer and data scientist and hopes to evolve the data developers thinking “beyond database structures”, but also introduces key concepts and terminologies such as probability, statistical inference, model fitting, classification, regression and more, that can be used to journey into statistics and data science. How is statistics used when it comes to cleaning and pre-processing the data? How does it help the analysis? What other tasks can these statistical techniques be used for? Simple examples of the use of statistics when cleaning and/or pre-processing of data (by a data developer) include data-typing, Min/Max limitation, addressing missing values and so on. A really good opportunity for the use of statistics in data or database development is while modeling data to design appropriate storage structures. Using statistics in data development applies a methodical, structured approach to the process. The use of statistics can be a competitive advantage to any data development project. In the book, for practical purposes, you have shown the implementation of the different statistical techniques using the popular R programming language. Why do you think R is favored by the statisticians so much? What advantages does it offer? R is a powerful, feature-rich, extendable free language with many, many easy to use packages free for download. In addition, R has “a history” within the data science industry. R is also quite easy to learn and be productive with quickly. It also includes many graphics and other abilities “built-in”. Do you foresee a change in the way statistics for data science is used in the near future? In other words, will the dependency on statistical techniques for performing different data science tasks reduce? Statistics will continue to be important to data science. I do see more “automation” of more and more data science tasks through the availability of “off the shelf” packages that can be downloaded and installed and used. Also, the more popular tools will continue to incorporate statistical functions over time. This will allow for the main-streaming of statistics and data science into even more areas of life. The key will be for the user to have an understanding of the key statistical concepts and uses. What advice would you like to give to - 1 Those transitioning from the developer to the data scientist role, and 2. Absolute beginners, who want to take up statistics and data science as a career option? Buy my book! But seriously, keep reading and researching. Expose yourself to as much statistics and data science use cases and projects a possible. Most importantly, as you read about the topic, look for similarities between what you do today and what you are reading about. How does it relate? Always look for opportunities to use something that is new to you to do something you do routinely today. Your book 'Statistics for Data Science' highlights different statistical techniques for data analysis and finding unique insights from data. What are the three key takeaways for the readers, from this book? Again, I see (and point out in the book) key synergies between data or database development and data science. I would urge the reader – or anyone looking to move from data developer to data scientist - to learn through these and perhaps additional examples he or she may be able to find and leverage on their own. Using this technique, one can perhaps navigate laterally, rather than losing the time it would take to “start over” at the beginning (or bottom?) of the data science learning curve. Additionally, I would suggest to the reader that time taken to get acquainted with the R programs and the logic used for statistical computations (this book should be a good start) is time well spent.

0
0
20812

Amey Varangaonkar

13 Sep 2017

5 min read

Why you should use Keras for deep learning

Amey Varangaonkar

13 Sep 2017

5 min read

A lot of people rave about TensorFlow and Theano, but there are is one complaint you hear fairly regularly: that they can be a little challenging to use if you're directly building deep learning models. That’s where Keras comes to the rescue. It's a high-level deep learning library written in Python that can be used as a wrapper on top of TensorFlow or Theano, to simplify the model training process and to make the models more efficient. Sujit Pal is Technology Research Director at Elsevier Labs. He has been working with Keras for some time. He is an expert in Semantic Search, Natural Language Processing and Machine Learning. He's also the co-author of Deep Learning with Keras, which is why we spoke to him about why you should use start using Keras (he's very convincing). 5 reasons you should start using Keras Keras is easy to get started with if you’ve worked with Python before and have some basic knowledge of neural networks. It works on top of Theano and TensorFlow seamlessly to create efficient deep learning models. It offers just the right amount of abstraction - allowing you to focus on the problem at hand rather than worry about the complexity of using the framework. It is a handy tool to use if you’re looking to build models related to Computer Vision or Natural Language Processing. Keras is a very expressive framework that allows for rapid prototyping of models. Why I started using Keras Packt: Why did you start using using Keras? Sujit Pal: My first deep learning toolkit was actually Caffe, then TensorFlow, both for work related projects. I learned Keras for a personal project and I was impressed by the Goldilocks (i.e. just right) quality of the abstraction. Thinking at the layer level was far more convenient than having to think in terms of matrix multiplication that TensorFlow makes you do, and at the same time I liked the control I got from using a programming language (Python) as opposed to using JSON in Caffe. I've used Keras for multiple projects now. Packt: How has this experience been different from other frameworks and tools? What problems does it solve exclusively? Sujit: I think Keras has the right combination of simplicity and power. In addition, it allows you to run against either TensorFlow or Theano backends. I understand that it is being extended to support two other backends - CNTK and MXNet. The documentation on the Keras site is extremely good and the API itself (both the Sequential and Functional ones) are very intuitive. I personally took to it like a fish to water, and I have heard from quite a few other people that their experiences were very similar. What you need to know to start using Keras Packt: What are the prerequisites to learning Keras? And what aspects are tricky to learn? Sujit: I think you need to know some basic Python and have some idea about Neural Networks. I started with Neural Networks from the Google/edX course taught by Vincent Van Houke. It’s pretty basic (and taught using TensorFlow) but you can start building networks with Keras even with that kind of basic background. Also, if you have used numpy or scikit-learn, some of the API is easier to pick up because of the similarities. I think the one aspect I have had a few problems with is building custom layers. While there is some documentation that is just enough to get you started, I think Keras would be usable in many more situations if the documentation for the custom layers was better, maybe more in line with the rest of Keras. Things like how to signal that a layer supports masking or multiple tensors, debugging layers, etc. Packt: Why do you use Keras in your day-to-day programming and data science tasks? Sujit: I have spent most of last year working with Image classification and similarity, and I've used Keras to build most of my more recent models. This year I am hoping to do some work with NLP as it relates to images, such as generating image captions, etc. On the personal projects side, I have used Keras for building question answering and disease prediction models, both with data from Kaggle competitions. How Keras could be improved Packt: As a developer, what do you think are the areas of development for Keras as a library? Where do you struggle the most? Sujit: As I mentioned before, the Keras API is quite comprehensive and most of the time Keras is all you need to build networks, but occasionally you do hit its limits. So I think the biggest area of Keras that could be improved would be extensibility, using its backend interface. Another thing I am excited about is the contrib.keras package in TensorFlow, I think it might open up even more opportunity for customization, or at least the potential to maybe mix and match TensorFlow with Keras.

0
0
20496

article-image-pandas-answers-data-analysis-problems-interview

Amey Varangaonkar

24 Apr 2018

9 min read

“Pandas is an effective tool to explore and analyze data”: An interview with Theodore Petrou

Amey Varangaonkar

24 Apr 2018

9 min read

It comes as no surprise to many developers, Python has grown to become the preferred language of choice for data science. One of the reasons for its staggering adoption in the data science community is the rich suite of libraries for effective data analysis and visualization - allowing you to extract useful, actionable insights from your data. Pandas is one such Python-based library, that provides a solid platform to carry out high-performance data analysis. Ted Petrou is a data scientist and the founder of Dunder Data, a professional educational company focusing on exploratory data analysis. Before founding Dunder Data, Ted was a data scientist at Schlumberger, a large oil services company, where he spent the vast majority of his time exploring data. Ted received his Master’s degree in statistics from Rice University and has used his analytical skills to play poker professionally. He taught math before becoming a data scientist. He is a strong supporter of learning through practice and can often be found answering questions about pandas on Stack Overflow. In this exciting interview, Ted takes us through an insightful journey into pandas - Python’s premier library for exploratory data analysis, and tells us why it is the go-to library for many data scientists to discover new insights from their data. Key Takeaways Data scientists are in the business of making predictions. To make the right predictions you must know how to analyse your data. to perform data analysis efficiently, you must have a good understanding of the concepts as well be proficient using the tools like pandas. Pandas Cookbook contains step by step solutions to the master the pandas syntax while going through the data exploration journey (missteps et al) to solve the most common and not-so-common problems in data analysis. Unlike R which has several different packages for different data science tasks, pandas offers all data analysis capabilities as a single large Python library. Pandas has good time-series capabilities, making it well-suited for building financial applications. That said, its best use is in data exploration - to find interesting discoveries within the data. Ted says beginners in data science should focus on learning one data science concept at a time and master it thoroughly, rather than getting an overview of multiple concepts at once. Let us start with a very fundamental question - Why is data crucial to businesses these days? What problems does it solve? All businesses, from a child’s lemonade stand to the largest corporations, must account for all their operations in order to be successful. This accounting of supplies, transactions, people, etc., is what we call ‘data’ and gives us historical records of what has transpired in a business. Without this data, we would be reduced to oral history or what humans used for accounting before the advent of writing systems. By collecting and analyzing data, we gain a deeper understanding of how the business is progressing. In the most basic instances, such as with a child’s lemonade stand, we know how many glasses of lemonade have been sold, how much was spent on supplies, and importantly whether the business is profitable. This example is incredibly trivial, but it should be noted that such simple data collection is not something that comes naturally to humans. For instance, many people have a desire to lose weight at some point in their life, but fail to accurately record their daily weight or calorie intake in any regular manner, despite the large number of free services available to help with this. There are so many Python-based libraries out there which can be used for a variety of data science tasks. Where does pandas fit into this picture? pandas is the most popular library to perform the most fundamental tasks of a data analysis. Not many libraries can claim to provide the power and flexibility of pandas for working with tabular data. How does pandas help data scientists in overcoming different challenges in data analysis? What advantages does it offer over domain-specific languages such as R? One of the best reasons to use pandas is because it is so popular. There are a tremendous amount of resources available for it, and an excellent database of questions and answers on StackOverflow. Because the community is so large, you can almost always get an immediate answer to your problem. Comparing pandas to R is difficult as R is an entire language that provides tools for a wide variety of tasks. Pandas is a single large Python library. Nearly all the tasks capable in pandas can be replicated with the right library in R. We would love to hear your journey as a data scientist. Did having a master's degree in statistics help you in choosing this profession? Also tell us something about how you leveraged analytics in professional Poker! My journey to becoming a “data scientist” began long before the term even existed. As a math undergrad, I found out about the actuarial profession, which appealed to me because of its meritocratic pathway to success. Because I wasn’t certain that I wanted to become an actuary, I entered a Ph.D. program in statistics in 2004, the same year that an online poker boom began. After a couple of unmotivating and half-hearted attempts at learning probability theory, I left the program with a masters degree to play poker professionally. Playing poker has been by far the most influential and beneficial resource for understanding real-world risk. Data scientists are in the business of making predictions and there’s no better way to understand the outcomes of predictions you make than by exposing yourself to risk. Your recently published 'pandas Cookbook' has received a very positive response from the readers. What problems in data analysis do you think this book solves? I worked extremely hard to make pandas Cookbook the best available book on the fundamentals of data analysis. The material was formulated by teaching dozens of classes and hundreds of students with my company Dunder Data and my meetup group Houston Data Science. Before getting to what makes a good data analysis, it’s important to understand the difference between the tools available to you and the theoretical concepts. Pandas is a tool and is not much different than a big toolbox in your garage. It is possible to master the syntax of pandas without actually knowing how to complete a thorough data analysis. This is like knowing how to use all the individual tools in your toolbox without knowing how to build anything useful, such as a house. Similarly, understanding theoretical concepts such as ‘split-apply-combine’ or ‘tidy data’ without knowing how to implement them with a specific tool will not get you very far. Thus, in order to make a good data analysis, you need to understand both the tools and the concepts. This is what pandas Cookbook attempts to provide. The syntax of pandas is learned together with common theoretical concepts using real-world datasets. Your readers loved the way you have structured the book and the kind of datasets, examples and functions you have chosen to showcase pandas in all its glory. Was is experience, intuition, or observations that led to this fantastic writing insight? The official pandas documentation is very thorough (well over 1,000 pages) but does not present the features as you would see them in a real data analysis. Most of the operations are shown in isolation on contrived or randomly generated data. In a typical data analysis, it is common for many pandas operations to be called one after another. The recipes in pandas Cookbook expose this pattern to the reader, which will help them when they are completing an actual data analysis. This is not meant to disparage the documentation as I have read it multiple times myself and recommend reading it along with pandas Cookbook. Quantitative finance is one domain where pandas finds major application. How does pandas help in developing better financial applications? In what other domains does pandas find important applications and how? Pandas has good time-series capabilities which makes it well-suited for financial applications. It’s ability to group by specific time periods is a very useful feature. In my opinion, pandas most important application is with exploratory data analysis. It is possible for an analyst to quickly use pandas to find interesting discoveries within the data and visualize the results with either matplotlib or Seaborn. This tight integration, coupled with the Jupyter Notebook interface make for an excellent ecosystem for generating and reporting results to others. Please tell us more about 'pandas Cookbook'. What in your opinion are the 3 major takeaways from it? Are there any prerequisites needed to get the most out of the book? The only prerequisite for pandas Cookbook is a fundamental understanding of the Python programming language. The recipes progress in difficulty from chapter to chapter and for those with no pandas experience, I would recommend reading it cover to cover. One of the major takeaways from the book is to be able to write modern and idiomatic pandas code. Pandas is a huge library and there are always multiple ways of completing each task. This is more of a negative than a positive as beginners notoriously write poorly written and inefficient code. Another takeaway is the ability to probe and investigate data until you find something interesting. Many of the recipes are written as if the reader is experiencing the discovery process alongside the author. There are occasional (and purposeful) missteps in some recipes to show how often the right course of action is not always known. Lastly, I wanted to teach common theoretical concepts of doing a data analysis while simultaneously learning pandas syntax. Finally, what advice would you have for beginners in data science? What things should they keep in mind while designing and developing their data science workflow? Are there any specific resources which they could refer to, apart from this book of course? For those just beginning their data science journey, I would suggest keeping their ‘universe small’. This means concentrating on as few things as possible. It is easy to get caught up with a feeling that you need to keep learning as much as possible. Mastering a few subjects is much better than having a cursory knowledge of many. If you found this interview to be intriguing, make sure you check out Ted’s pandas Cookbook which presents more than 90 unique recipes for effective scientific computation and data analysis.

0
1
20493

article-image-deep-meta-reinforcement-learning-will-be-the-future-of-ai-where-we-will-be-so-close-to-achieving-artificial-general-intelligence-agi-sudharsan-ravichandiran

Sunith Shetty

13 Sep 2018

9 min read

“Deep meta reinforcement learning will be the future of AI where we will be so close to achieving artificial general intelligence (AGI)”, Sudharsan Ravichandiran

Sunith Shetty

13 Sep 2018

9 min read

0
0
20343

article-image-microsoft-power-bi-interview-part1-brett-powell

Amey Varangaonkar

09 Oct 2017

8 min read

Ride the third wave of BI with Microsoft Power BI

Amey Varangaonkar

09 Oct 2017

8 min read

[dropcap]S[/dropcap]elf-service Business Intelligence is the buzzword everyone's talking about today. It gives modern business users the ability to find unique insights from their data without any hassle. Amidst a myriad of BI tools and platforms out there in the market, Microsoft’s Power BI has emerged as a powerful, all-encompassing BI solution - empowering users to tailor and manage Business Intelligence to suit their unique needs and scenarios. [author title="Brett Powell"]A Microsoft Power BI partner, and the founder and owner of Frontline Analytics LLC., a BI and analytics research and consulting firm. Brett has contributed to the design and development of Microsoft BI stack and Power BI solutions of diverse scale and complexity across the retail, manufacturing, financial, and services industries. He regularly blogs about the latest happenings in Microsoft BI and Power BI features at Insight Quest. He is also an organizer of the Boston BI User Group.[/author] In this two part interview Brett talks about his new book, Microsoft Power BI Cookbook, and shares his insights and expertise in the area of BI and data analytics with a particular focus on Power BI. In part one, Brett shares his views on topics ranging from what it takes to be successful in the field of BI & data analytics to why he thinks Microsoft is going to lead the way in shaping the future of the BI landscape. In part two of the interview, he shares his expertise with us on the unique features that differentiate Power BI from other tools and platforms in the BI space. Key Takeaways Ease of deployment across multiple platforms, efficient data-driven insights, ease of use and support for a data-driven corporate culture are factors to consider while choosing a Business Intelligence solution for enterprises. Power BI leads in self-service BI because it’s the first Software-as-a-Service (SaaS) platform to offer ‘End User BI’ where anyone, not just a business analyst, can leverage powerful tools to obtain greater value from data. Microsoft Power BI has been identified as a leader in Gartner’s Magic Quadrant for BI and Analytics platforms, and provides a visually rich and easy to access interface that modern business users require. You can isolate report authoring from dataset development in Power BI, or quickly scale up or down a Power BI dataset as per your needs. Power BI is much more than just a tool for reports and dashboards. With a thorough understanding of the query and analytical engines of Power BI, users can customize more powerful and sustainable BI solutions. Part One Interview Excerpts - Power BI from a Bird’s Eye View On choosing the right BI solution for your enterprise needs What are some key criteria one must evaluate while choosing a BI solution for enterprises? How does Power BI fare against these criteria as compared with other leading solutions from IBM, Oracle and Qlikview? Enterprises require a platform which can be implemented on their terms and adapted to their evolving needs. For example, the platform must support on-premises, cloud, and hybrid deployments with seamless integration allowing organizations to both leverage on-premises assets as well as fully manage their cloud solution. Additionally, the platform must fully support both corporate business intelligence processes such as staged deployments across development and production environments as well as self-service tools which empower business teams to contribute to BI projects and a data driven corporate culture. Furthermore, enterprises must consider the commitment of the vendor to BI and analytics, the full cost of scaling and managing the solution, as well as the vendors’ vision for delivering emerging capabilities such as artificial intelligence and natural language. Microsoft Power BI has been identified as a leader in Gartner’s Magic Quadrant for BI and Analytics platforms based on both its currently ability to execute as well as its vision. Particularly now with Power BI Premium, the Power BI Report Server, and Power BI embedded offerings, Power BI truly offers organizations the ability to tailor and manage BI to their unique needs and scenarios. Power BI’s mobile application, available on all common platforms (iOS, Android) in addition to continued user experience improvements in the Power BI service provides a visually rich and common interface for the ‘anytime access’ that modern business users require. Additionally, since Power BI’s self-service authoring tool of Power BI Desktop shares the same engine as SQL Server Analysis Services, Power BI has a distinct advantage in enabling organizations to derive value from both self-service and corporate BI. The BI landscape is very competitive and other vendors such as Tableau and Qlikview have obtained significant market share. However, as organizations fully consider the features distinguishing the products in addition to the licensing structures and the integration with Microsoft Azure, Office 365, and common existing BI assets such as Excel and SQL Server Reporting Services and Analysis Services, they will (and are) increasingly concluding that Power BI provides a compelling value. On the future of BI and why Brett is betting on Microsoft to lead the way Self-service BI as a trend has become mainstream. How does Microsoft Power BI lead this trend? Where do you foresee the BI market heading next i.e., are there other trends we should watch out for? Power BI leads in self-service BI because it’s the first software as a service (SaaS) platform to offer ‘End User BI’ in which anyone, not just a business analyst, can leverage powerful tools to obtain greater value from data. This ‘third wave’ of BI, as Microsoft suggests, further follows and supplements the first and second waves of BI in Corporate and self-service BI, respectively. For example, Power BI’s Q & A experience with natural language queries and integration with Cortana goes far beyond the traditional self-service process of an analyst finding field names and dragging and dropping items on a canvas to build a report. Additionally, an end user has the power of machine learning algorithms at their fingertips with features such as Quick Insights now built into Power BI Desktop. Furthermore, it’s critical to understand that Microsoft has a much larger vision for self-service BI than other vendors. Self-service BI is not exclusively the visualization layer over a corporate IT controlled data model – it’s also the ability for self-service solutions to be extended and migrated to corporate solutions as part of a complete BI strategy. Given their common underlying technologies, Microsoft is able to remove friction between corporate and self-service BI and allows organizations to manage modern, iterative BI project lifecycles. On staying ahead of the curve in the data analytics & BI industry For someone just starting out in the data analytics and BI fields, what would your advice be? How can one keep up with the changes in this industry? I would focus on building a foundation in the areas which don’t change frequently such as math, statistics, and dimensional modeling. You don’t need to become a data scientist or a data warehouse architect to deliver great value to organizations but you do need to know the basic tools of storing and analysing data to answer business questions. To succeed in this industry over time you need to consistently invest in your skills in the areas and technologies relevant to your chosen path. You need to hold yourself accountable for becoming a better data professional and this can be accomplished by certification exams, authoring technical blogs, giving presentations, or simply taking notes from technical books and testing out tools and code on your machine. For hard skills I’d recommend standard SQL, relational database fundamentals, data warehouse architecture and dimensional model design, and at least a core knowledge of common data transformation processes and/or tools such as SQL Server Integration Services (SSIS) and SQL stored procedures. You’ll need to master an analytical language as well and for Microsoft BI projects that language is increasingly DAX. For soft skills, you need to move beyond simply looking for a list of requirements for your projects. You need to learn to become flexible and active – you need to become someone who offers ideas and looks to show value and consistently improve projects rather than just ‘deliver requirements’. You need to be able to have both a deeply technical conversation but also have a very practical conversation with business stakeholders. You need to able to build relationships with both business and IT. You don’t ever want to dominate or try to impress anyone but if you’re truly passionate about your work then this will be visible in how you speak about your projects and the positive energy you bring to work every day and to your ongoing personal development. If you enjoyed this interview, check out Brett’s latest book, Microsoft Power BI Cookbook. In part two of the interview, Brett shares 5 Power BI features to watch out for, 7 reasons to choose Power BI to build enterprise solutions and more. Visit us tomorrow to read part two of the interview.

0
0
19160

article-image-tableau-powerful-analytics-platform-interview-joshua-milligan

Sunith Shetty

22 May 2018

9 min read

“Tableau is the most powerful and secure end-to-end analytics platform”: An interview with Joshua Milligan

Sunith Shetty

22 May 2018

9 min read

Tableau is one of the leading BI tools used by data science and business intelligence professionals today. You can not only use it to create powerful data visualizations but also use it to extract actionable insights for quality decision making thanks to the plethora of tools and features it offers. We recently interviewed Joshua Milligan, a Tableau Zen Master and the author of the book, Learning Tableau. Joshua takes us on an insightful journey into Tableau explaining why it is the Google of data visualization. He tells us all about its current and future focus areas such as Geospatial analysis and automating workflows, the exciting new features and tools such as Hyper, Tableau Prep among other topics. He also gives us a preview of things to come in his upcoming book. Author’s Bio Joshua Milligan, author of the bestselling book, Learning Tableau, has been with Teknion Data Solutions since 2004 and currently serves as a principal consultant. With a strong background in software development and custom .NET solutions, he brings a blend of analytical and creative thinking to BI solutions. Joshua has been named Tableau Zen Master, the highest recognition of excellence from Tableau Software not once but thrice. In 2017, Joshua competed as one of three finalists in the prestigious Tableau Iron Viz competition. As a Tableau trainer, mentor, and leader in the online Tableau community, he is passionate about helping others gain insights from their data. His work has been featured multiple times on Tableau Public’s Viz of the Day and Tableau’s website. He also shares frequent Tableau (and Maestro) tips, tricks, and advice on his blog VizPainter.com. Key Takeaways Tableau is perfectly tailored for business intelligence professionals given its extensive list of offerings from data exploration to powerful data storytelling. The drag-and-drop interface allows you to understand data visually thus enabling anyone to perform and share self service data analytics with colleagues in seconds. Hyper is new in-memory data engine designed for powerful query analytical processing on complex datasets. Tableau Prep, a new data preparation tool released with Tableau 2018.1, allows users to easily combine, shape, analyze and clean the data for compelling analytics. Tableau 2018.1 is expected to bring new geospatial tools, enterprise enhancements to Tableau Server, and new extensions and plugins to create interactive dashboards. Tableau users can expect to see artificial intelligence and machine learning becoming major features in both Tableau and Tableau Prep - thus deriving insights based on users behavior across the enterprise. Full Interview There are many enterprise software for business intelligence, how does Tableau compare against the others? What are the main reasons for Tableau's popularity? Tableau's paradigm is what sets it apart from others. It's not just about creating a chart or dashboard. It's about truly having a conversation with the data: asking questions and seeing instant results as you drag and drop to get new answers that raise deeper questions and then iterating. Tableau allows for a flow of thought through the entire cycle of analytics from data exploration through analysis to data storytelling. Once you understand this paradigm, you will flow with Tableau and do amazing things! There's a buzz in the developer's community that Tableau is the Google of data visualization. Can you list the top 3-5 features in Tableau 10.5 that are most appreciated by the community? How do you use Tableau in your day-to-day work? Tableau 10.5 introduced Hyper - a next-generation data engine that really lays a foundation for enterprise scaling as well as a host of exciting new features and Tableau 2018.1 builds on this foundation. One of the most exciting new features is a completely new data preparation tool - Tableau Prep. Tableau Prep complements Tableau Desktop and allows users to very easily clean, shape, and integrate their data from multiple sources. It’s intuitive and gives you a hands-on, instant feedback paradigm for data preparation in a similar way to what Tableau Desktop enables with data visualization. Tableau 2018.1 also includes new geospatial features that make all kinds of analytics possible. I’m particularly excited about support for the geospatial data types and functions in SQL Server which have allowed me to dynamically draw distances and curves on maps. Additionally, web authoring in Tableau Server is now at parity with Tableau Desktop. I use Tableau every day to help my clients see and understand their data and to make key decisions that drive new business, avoid risk, and find hidden opportunities. Tableau Prep makes it easier to access the data I need and shape it according to the analysis I’ll be doing. Tableau offers a wide range of products to suit their users' needs. How does one choose the right product from their data analytics or visualization need? For example, what are the key differences between Tableau Desktop, Server and Public? Are there any plans for a unified product for the Tableau newbie in the near future? As a consultant at Teknion Data Solutions (a Tableau Gold Partner), I work with clients all the time to help them make the best decisions around which Tableau offering best meets their needs. Tableau Desktop is the go-to authoring tool for designing visualizations and dashboards. Tableau Server, which can be hosted on premises or in the cloud, gives enterprises and organizations the ability to share and scale Tableau. It is now at near parity with Tableau Desktop in terms of authoring. Tableau Online is the cloud-based, Tableau managed solution. Tableau Public allows for sharing public visualizations and dashboards with a world-wide audience. How good is Tableau for Self-Service Analytics / automating workflows? What are the key challenges and limitations? Tableau is amazing for this. Combined with the new data prep tool - Tableau Prep - Tableau really does offer users, across the spectrum (from business users to data scientists), the ability to quickly and easily perform self-service analytics. As with any tool, there are definitely cases which require some expertise to reach a solution. Pulling data from an API or web-based source or even sometimes structuring the data in just the right way for the desired analysis are examples that might require some know-how. But even there, Tableau has the tools that make it possible (for example, the web data connector) and partners (like Teknion Data Solutions) to help put it all together. In the third edition of Learning Tableau, I expand the scope of the book to show the full cycle of analytics from data prep and exploration to analysis and data storytelling. Expect updates on new features and concepts (such as the changes Hyper brings), a new chapter focused on Tableau Prep and strategies for shaping data to perform analytics, and new examples throughout that span multiple industries and common analytics questions. What is the development roadmap for Tableau 2018.1? Are we expecting major feature releases this year to overcome some of the common pain areas in business intelligence? I'm particularly excited about Tableau 2018.1. Tableau hasn't revealed everything yet, but things such as new geospatial tools and features, enterprise enhancements to Tableau Server, the new extensions API, new dashboard tools, and even a new visualization type or two look to be amazing! Tableau is working a lot in the geospatial domain coming up with new plugins/connectors and features. Can we expect Tableau to further strengthen their support for spatial data? What are the other areas/domains that Tableau is currently focused on? I couldn't say what the top 3-5 areas are - but you are absolutely correct that Tableau is really putting some emphasis on geospatial analytics. I think the speed and power of the Hyper data engine makes a lot of things like this possible. Although I don't have any specific knowledge beyond what Tableau has publicly shared, I wouldn't be surprised to see some new predictive and statistical models and expansion of data preparation abilities. What's driving Tableau to Cloud? Can we expect more organizations adopting Tableau on Cloud? There has been a major shift to the cloud by organizations. The ability to manage, scale, ensure up-time, and save costs are driving this move and that in turn makes Tableau's cloud-based offerings very attractive. What does Tableau's future hold, according to you? For example, do you see machine learning and AI-powered analytics platform transformation? Or can we expect Tableau entering the IoT and IIoT domain? Tableau demonstrated a concept around NLQ at the Tableau Conference and has already started building in a few machine learning features. For example, Tableau now recommends joins based on what is learns from behavior of users across the enterprise. Tableau Prep has been designed from the ground-up with machine learning in mind. I fully expect to see AI and machine learning become major features in both Tableau and Tableau Prep – but true to Tableau’s paradigm, they will complement the work of the analyst and allow for deeper insight without obscuring the role that humans play in reaching that insight. I'm excited to see what is announced next! Give us a sneak peek into the book you are currently writing "Learning Tableau 2018.1, Third Edition", expected to be released in the 3rd Quarter this year. What should our readers get most excited about as they wait for this book? Although the foundational concepts behind learning Tableau remain the same, I'm excited about the new features that have been released or will be as I write. Among these are a couple of game-changers such as the new geospatial features and the new data prep tool: Tableau Prep. In addition to updating the existing material, I'll definitely have a new chapter or two covering those topics! If you found this interview to be interesting, make sure you check out other insightful articles on business intelligence: Top 5 free Business Intelligence tools [Opinion] Tableau 2018.1 brings new features to help organizations easily scale analytics [News] Ride the third wave of BI with Microsoft Power BI [Interview - Part 1] Unlocking the secrets of Microsoft Power BI [Interview - Part 2] How Qlik Sense is driving self-service Business Intelligence [Interview]

0
0
18745

article-image-qlik-sense-driving-self-service-business-intelligence

Amey Varangaonkar

12 Dec 2017

11 min read

How Qlik Sense is driving self-service Business Intelligence

Amey Varangaonkar

12 Dec 2017

11 min read

Delivering Business Intelligence solutions to over 40000 customers worldwide, there is no doubt that Qlik has established a strong foothold in the analytics market for many years now. With the self-service capabilities of Qlik Sense, you can take better and more informed decisions than ever before. From simple data exploration to complex dashboarding and cloud-ready, multi-platform analytics, Qlik Sense gives you the power to find crucial, hidden insights from the depths of your data. We got some fascinating insights from our interview with two leading Qlik community members, Ganapati Hegde and Kaushik Solanki, on what Qlik Sense offers to its users and what the future looks like for the BI landscape. [box type="shadow" align="" class="" width=""] Ganapati Hegde Ganapati is an engineer by background and carries an overall IT experience of over 16 years. He is currently working with Predoole Analytics, an award-winning Qlik partner in India, in the presales role. He has worked on BI projects in several industry verticals and works closely with customers, helping them with their BI strategies. His experience in other aspects of IT, like application design and development, cloud computing, networking, and IT Security - helps him design perfect BI solutions. He also conducts workshops on various technologies to increase user awareness and drive their adoption. Kaushik Solanki Kaushik has been a Qlik MVP (Most Valuable Player) for the years 2016 and 2017 and has been working with the Qlik technology for more than 7 years now. An Information technology engineer by profession, he also holds a master’s degree in finance. Having started his career as a Qlik developer, Kaushik currently works with Predoole Analytics as the Qlik Project Delivery Manager and is also a certified QlikView administrator. An active member of Qlik community, his great understanding of project delivery - right from business requirement to final implementation, has helped many businesses take valuable business decisions.[/box] In this exciting interview, Ganapati and Kaushik take us through a compelling journey in self-service analytics, by talking about the rich features and functionalities offered by Qlik Sense. They also talk about their recently published book ‘Implementing Qlik Sense’ and what the readers can learn from it. Key Takeaways With many self-service and guided analytics features, Qlik Sense is perfectly tailored to business users Qlik Sense allows you to build customized BI solutions with an easy interface, good mobility, collaboration, focus on high performance and very good enterprise governance Built-in capabilities for creating its own warehouse, a strong ETL layer and a visualization layer for creating intuitive Business Intelligence solutions are some of the strengths of Qlik Sense With support for open APIs, the BI solutions built using Qlik Sense can be customized and integrated with other applications without any hassle. Qlik Sense is not a rival to Open Source technologies such as R and Python. Qlik Sense can be integrated with R or Python to perform effective predictive analytics ‘Implementing Qlik Sense’ allows you to upgrade your skill-set from a Qlik developer to a Qlik Consultant. The end goal of the book is to empower the readers to implement successful Business Intelligence solutions using Qlik Sense. Complete Interview There has been a significant rise in the adoption of Self-service Business Intelligence across many industries. What role do you think visualization plays in self-service BI? In a vast ocean of self-service tools, where do you think Qlik stands out from the others? As Qlik says visualization alone is not the answer. A strong backend engine is needed which is capable of strong data integration and associations. This then enables businesses to perform self-service and get answers to all their questions. Self-service plays an important role in the choice of visualization tools, as business users today no longer want to go to IT every time they need changes. Self service enable business users to quickly build their own visualization with simple drag and drop. Qlik stands out from the rest in its capability to bring in multiple data sources, enabling users to easily answers questions. Its unique associative engine allows users to find hidden insights. The open API allows easy customization and integrations which is a must for enterprises. Data security and governance is one of the best in Qlik. What are the key differences between QlikView and Qlik Sense? What are the factors crucial to building powerful Business Intelligence solutions with Qlik Sense? QlikView and Qlik Sense are similar yet different. Both share the same engine. On one hand, QlikView is a developer’s delight with the options it offers, and on the other hand, Qlik Sense with its self-service is more suited for business users. Qlik Sense has better mobility and open API as compared to QlikView, making Qlik Sense more customizable and extensible. The beauty of Qlik Sense lies in its ability to help business get answers to their questions. It helps correlate the data between different data sources and making it very meaningful to users. Powerful data visualizations do not necessarily mean beautiful visualizations and Qlik Sense lays special emphasis on this. Finally what the users need is performance, easy interface, good mobility, collaboration and good enterprise governance - something which Qlik Sense provides. Ganapati, you have over 15 years of experience in IT, and have extensively worked in the BI domain for many years. Please tell us something about your journey. How does your daily schedule look like? I have been fortunate in my career to be able to work on multiple technologies ranging from programming, databases, information security, integrations and cloud solutions. All this knowledge is helping me propose the best solutions for my Qlik customers. It’s a pleasure helping customers in their analytical journey and working for a services company helps in meeting customers from multiple domains. The daily schedule involves doing Proof of Concepts/Demos for customers, designing optimum solutions on Qlik, and conducting requirement gathering workshops. It’s a pleasure facing new challenges every day and this helps me increase my knowledge base. Qlik open API opens up amazing new possibilities and lets me come up with out of the box solutions. Kaushik, you have been awarded the Qlik MVP for 2016 and 2017, and have experience of using Qlik's tools for over 7 years. Please tell us something about your journey in this field. How do you use the tool in your day to day work? I started my career by working with the Qlik technology. My hunger for learning Qlik made me addicted to the Qlik community. I learned lot many things from the community by asking questions and solving real-world problems of community members. This helped me to get awarded by Qlik as MVP for consecutively 2 years. MVP award motivated me to help Qlik customers and users and that is one of the reasons why I thought about writing a book on Qlik Sense. I have implemented Qlik not only for clients but also for my personal use cases. There are many ways in which Qlik helps me in my day-to-day work and makes my life much easier. It’s safe to say that I absolutely love Qlik. Your book 'Implementing Qlik Sense' is primarily divided into 4 sections - with each section catering to a specific need when it comes to building a solid BI solution. Could you please talk more about how you have structured the book, and why? BI projects are challenging, and it really hurts when a project doesn’t succeed. The purpose of the book is to enable Qlik Sense developers to get to implement successful Qlik Projects. There is often a lot of focus on development and thereby Qlik developers miss several other crucial factors which contribute to project success. To make the journey from a Qlik developer to a Qlik consultant the book is divided into 4 sections. The first section focuses on the initial preparation and intended to help consultant to get their groundwork done. The second section focuses on the execution of the project and intended to help consultants play a key role in rest of phases involving requirement gathering, architecture, design, development UAT. The third section is intended to make consultant familiar with some industry domains. This section is intended to help consultant in engaging better with business users and suggesting value-additions to project. The last section is to use the knowledge gained in the three sections and approaching a project with a case study which we come across routinely. Who is the primary target audience for this book? Are there any prerequisites they need to know before they start reading this book? The primary target audience is the Qlik Developers who are looking to progress in their career and are looking to wear the hat of a Qlik consultant. The book is also for existing consultants who would like to sharpen their skills and use Qlik Sense more efficiently. The book will help them become trusted advisors to their clients. Those who are already familiar with some Qlik development will be able to get the most out of this book. Qlik Sense is primarily an enterprise tool. With the rise of open source languages such as R and Python, why do you think people would still prefer enterprise tools for their data visualization? Qlik Sense is not a competition to R and Python but there are lots of synergies. The customer gets the best value when Qlik co-exists with R/Python and can leverage the capabilities of both Qlik and R/Python. Qlik Sense does not have the predictive capability which is easily fulfilled by R/Python. For the customer, the tight integration ensures he/she doesn’t have to leave the Qlik screen. There can be other use cases for using them jointly such as analyzing unstructured data and using machine learning. The reports and visualizations built using Qlik Sense can be viewed and ported across multiple platforms. Can you please share your views on this? How does it help the users? Qlik has opened all gates to integrate its reporting and visualization with most of the technologies through APIs. This has empowered customers to integrate Qlik with their existing portals and provide easy access to end users. Qlik provides APIs for almost all its products, which makes Qlik the first choice for many CIOs because with those APIs they get a variety of options to integrate and automate their work. What are the other key functionalities of Qlik Sense that help the users build better BI solutions? Qlik Sense is not just a pure play data visualization tool. It has capabilities for creating its own warehouse, having an ETL layer and then of course there’s the visualization layer. For the customers, it’s all about getting all the relevant components required for their BI project in a single solution. Qlik is investing heavily in R&D and with its recent acquisitions and a strong portfolio, it is a complete solution enabling users to get all their use cases fulfilled. The open API has enabled opening newer avenues with custom visualizations, amazing concepts such as chatbots, augmented intelligence and much more. The core strength of strong data association, enterprise scalability, governance combined with all other aspects make Qlik one of the best in overall customer satisfaction. Do you foresee Qlik Sense competing strongly with major players such as Tableau and Power BI in the near future? Also, how do you think Qlik plans to tackle the rising popularity of the Open Source alternatives? Qlik has been classified as a Leader in the Gartner’s Magic Quadrant for several years now. We often come across Tableau and Microsoft Power BI as competition. We suggest our customers do a thorough evaluation and more often than not they choose Qlik for its features and the simplicity it offers. With recent acquisitions, Qlik Sense has now become an end-to-end solution for BI, covering uses cases ranging from report distributions, data-as-a-service, and geoanalytics as well. Open source alternatives have their own market and it makes more sense to leverage their capability rather than compete with them. An example, of course, is the strong integration of many BI tools with R or Python which makes life so much easier when it comes to finding useful insights from data. Lastly, what are the 3 key takeaways from your book 'Implementing Qlik Sense'? How will this book help the readers? The book is all about meeting your client’s expectations. The key takeaways are: Understand the role and importance of Qlik consultant and why it’s crucial to be a trusted advisor to your clients Successfully navigating through all aspects which enable successful implementation of your Qlik BI Project. Focus on mitigating risks, driving adoption and avoiding common mistakes while using Qlik Sense. The book is ideal for Qlik developers who aspire to become Qlik consultants. The book uses simple language and gives examples to make the learning journey as simple as possible. It helps the consultants to give equal importance to certain phases of project development that often neglected. Ultimately, the book will enable Qlik consultants to deliver quality Qlik projects. If this interview has nudged you to explore Qlik Sense, make sure you check out our book Implementing Qlik Sense right away!

0
0
18344

article-image-unlocking-the-secrets-of-microsoft-power-bi-interview-part-2-of-2-with-brett-powell-founder-of-frontline-analytics-llc

Amey Varangaonkar

10 Oct 2017

12 min read

Unlocking the secrets of Microsoft Power BI

Amey Varangaonkar

10 Oct 2017

12 min read

0
0
18100

Author Posts - Data

Understanding the Fundamentals of Analytics Teams with John K. Thompson

Understand Quickbooks online/desktop, online security, use cases, and more with Crystalynn Shelton, a certified QuickBooks ProAdvisor

Greg Walters on PyTorch and real-world implementations and future potential of GANs

Why the Industrial Internet of Things (IIoT) needs Architects

Prof. Rowel Atienza discusses the intuition behind deep learning, advances in GANs & techniques to create cutting-edge AI models

Why choose IBM SPSS Statistics over R for your data analysis project

“Data is the new oil but it has to be refined through a complex processing network” - Tirthajyoti Sarkar and Shubhadeep Roychowdhury [Interview]

Why You Need to Know Statistics To Be a Good Data Scientist

Why you should use Keras for deep learning

“Pandas is an effective tool to explore and analyze data”: An interview with Theodore Petrou

Trending Topics

“Deep meta reinforcement learning will be the future of AI where we will be so close to achieving artificial general intelligence (AGI)”, Sudharsan Ravichandiran

Ride the third wave of BI with Microsoft Power BI

“Tableau is the most powerful and secure end-to-end analytics platform”: An interview with Joshua Milligan

How Qlik Sense is driving self-service Business Intelligence

Unlocking the secrets of Microsoft Power BI

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access