Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

Tech Guides - Data

281 Articles
article-image-retail-analytics-offline-retailers
Savia Lobo
24 Nov 2017
7 min read
Save for later

Cyber Monday Special: Can Walmart beat Amazon in its own game? Probably, if they go full-throttle AI

Savia Lobo
24 Nov 2017
7 min read
Long gone are the days when people would go out exploring one store to another, to buy that beautiful pink dress or a particular pair of shoe.The e-commerce revolution, has surged online shopping drastically. How many time have we heard physical stores are dying. And yet they seem to have a cat-like hold on their lifespan. Wonder why? Because not everyone likes shopping online. We are aware of a group of people who still prefer to buy from the brick and mortar structure. They are like the doubting Thomas, remember the touch and believe concept? For customers who love shopping physically in a store, retailers strive to create a fascinating shopping experience in a way that online platform cannot offer. This is especially important for them to increase sale and generate profits on peak festive seasons such as Black Fridays or New Year’s. A lot has been talked about the wonders retail analytics data can do for e-commerce sites, read this article for instance. But not as much is talked about traditional stores. So, here we have listed down 10 retail analytics options for offline retailers, to capture maximum customer attention and retention. 1. Location analytics and proximity marketing A large number of retail stores collect data to analyze the volume of customers buying online and offline. They use this data for internal retail analytics, which helps them in merchandise tracking, adjust staffing levels, monitor promotions and so on. Retailers benefit from location analytics in order to detect a section of the store with high customer traffic. Proximity marketing uses location-based technology to communicate with customers through their smartphones. Customers receive targeted offers and discounts based on their proximity to the product. For instance, a 20% off on the floral dress to your right. Such on-the-go attractive deals have a higher likelihood of resulting in a customer sale. Euclid Analytics provides solutions to track the buying experience of every visitor in the store. This helps retailers retarget and rethink their strategies to influence sales at an individual customer level. 2. Music systems Nowadays, most large retail formats have music systems set up for the customers. A playlist with a mixed genre of music is an ideal fit for various customers visiting the store.  Retailers use the tactic right, with a correct tempo, volume, and genre to uplift customer’s mood resulting in a purchase. They also have to keep in mind the choice of a playlist. As, music preferences differ with generation, behavior, and the mood of the customer.  Store owners opt for music services with a customized playlist to create an influential buying factor. Atmos select provides a customized music service to retailers. It takes customer demographics into consideration to draft an audio branding strategy.  It is then used to design an audio solution for most of the retail outlets and stores. 3. Guest WiFi Guest wifi benefits customers by giving them free internet connection while they shop. Who would not want that? However, not only customers but retailers too benefit with such an offering. An in-store wifi provides them with detailed customer analytics and enables to track various shopping patterns. Cloud4Wi, offers Volare guest Wi-Fi, which provides free wi-fi service to customers within the retail store. It provides a faster and easier login option, to connect to the wifi. It also collects customer’s data for retailers to provides unique and selective marketing list. 4. Workforce tools Unison among the staff members within the work environment creates positivity in store. To increase communication between the staff, workforce tools are put to use. These are various messaging applications and work-planner platforms that help in maintaining a rapport among the staff members. It helps empower employees to maintain their work-life, check overtime details, attendance, and more. Branch, a tool to improve workforce productivity, helps internal messaging networks and also notifies employees about their shift timing, and other details. 5. Omnichannel retail analytics Omnichannel retail enables customer with an interactive and seamless shopping experience across platforms. Additionally,  with the data collected from different digital channels, retailers get an overview of customer’s shopping journey and the choices they made over time. Omnichannel analytics also assists them to showcase personalized shopping ads based on customer’s social media habits. Intel offers solutions for Omnichannel analytics which helps retailers increase customer loyalty and generate substantial revenue growth. 6. Dressing Room Technology The mirror within the trial room knows it all! Retailers can attract maximum customer traffic with the mirror technology. It is an interactive, touch screen mirror that allows customers to request new items and adjust the lights in the trial room. The mirror can also sense products that the customer brings in, using the RFID technology, and recommends similar products. It also assists them in saving products to their online accounts-- in case they decide to purchase them later--or digitally seek assistance from the store associate. Oak Labs, has created one such mirror which transforms customer trial room experience while bridging the gap between technology and retail. 7. Pop-ups and kiosks Pop-ups are mini-outlets for large retail formats, set up to sell a seasonal product. Whereas kiosks are temporary alternatives for retailers, to attract a high number of footfalls in store. Both pop-ups and kiosks benefit shoppers with the choice of self-service. They get an option to shop from the store’s physical as well as online product offering. They not only enable secure purchase but also deliver orders to your doorstep. Such techniques attract customers to choose retail shopping over online shopping. Withme, a startup firm that offers a platform to set up POP ups for retail outlets and brands.   8. Inventory management Managing the inventory is a major task for a store manager - to place the right product in the right place at the right time. Predictive analytics helps optimize inventory management for proper allocation, and replenishment process. It also equips retailers to markdown the inventory for clearance to reload a new batch. Celect, an inventory management startup helps retailers to analyze customer preferences and simultaneously map future demand for the product. It also helps in extraction of existing data from the inventory to gain meaningful insights. Such insights can then be taken into account for the faster sale of inventory and to get a detailed retail analytics based sales report. 9.  Smart receipts and ratings Retailers continuously aim to provide better quality service to the customer. Receiving a 5-star rating for their service in return is like a cherry on the cake.  For higher customer engagement, retailers offer smart receipts, which helps retailers collect customer email addresses to send promotional offers or festive sale discounts. Retailers also provide customers with personalized offerings and incentives in order to attract customer revisitation. To know how well retailers have fared in providing services, they set up a digital kiosk at the checkout area, where in-store customers can rate retailers based on the shopping experience. Startup firms such as TruRating aid retailers with a rating mechanism for shoppers at the checkout. FlexReceipts helps retailers to set up smart receipt application for the customers. 10. Shopping cart tech Retailers can now provide a next-gen shopping cart to their customers. A technology that can guide customer’s in-store shopping journey with a tablet-equipped shopping cart. The tablet uses machine vision to keep a track of the shelves, as the cart moves within the store. It also displays digital-ads to promote each product, the shopping cart passes through. Focal Systems build powerful technical assistance for retailers, which can give tough competition to their online counterparts. Online shopping is convenient but more often than not we still crave for the look and feel of a product and the immersive shopping experience especially during holidays and festive occasions. And that’s the USP of a Brick and Mortar shop. Offline retailers who know their data and know how to leverage retail analytics using advances in machine learning and retail tech stand a chance to provide their customers with a shopping experience superior to their online counterparts.
Read more
  • 0
  • 0
  • 8973

article-image-ibm-researchs-5-in-5-predictions-think-2018
Amey Varangaonkar
16 Apr 2018
4 min read
Save for later

What we learnt from IBM Research’s ‘5 in 5’ predictions presented at Think 2018

Amey Varangaonkar
16 Apr 2018
4 min read
IBM’s mission has always been to innovate and in the process, change the way the world operates. With this objective in mind, IBM Research started a conversation termed as ‘5 in 5’ way back in 2012, giving their top 5 predictions every year at IBM Think 2018 on how technology would change the world. These predictions are usually the drivers for their research and innovation - and eventually solving the problems by coming up with efficient solutions to them. Here are the 5 predictions made by IBM Research for 2018: More secure Blockchain products: In order to avoid counterfeit Blockchain products, the technology will be coupled with cryptographic solutions to develop decentralized solutions. Digital transactions are often subject to frauds, and securing them with crypto-anchors is seen as the way to go forward. Want to know how this can be achieved? You might want to check out IBM’s blog on crypto-anchors and their real world applications. If you are like me, you’d rather watch IBM researcher Andres Kind explain what crypto-anchors are in a fast paced science slam session. Sophisticated cyber attacks will continue to happen: Cyber attacks resulting in the data leaks or stealing of confidential data is not news to us. The bigger worry, though, is that the current methodologies to prevent these attacks are not proving to be good enough. IBM predicts this is only going to get worse, with more advanced and sophisticated cyber attacks breaking into the current secure systems with ease. IBM Research also predicted the rise of ‘lattice cryptography’, a new security mechanism offering a more sophisticated layer of protection for the systems. You can read more about lattice cryptography technology on IBM’s official blog. Or, you can watch IBM researcher Cecilia Boschini explain what is lattice cryptography in 5 minutes on one of IBM’s famous science slam sessions. Artificial Intelligence-powered bots will help clean the oceans: Our marine ecosystem seems to be going from bad to worse. This is mainly due the pollution and toxic wastes being dumped into it. IBM predicts that AI-powered autonomous bots, deployed and controlled on the cloud, can help relieve this situation by monitoring the water bodies for water quality and pollution levels. You can learn more about how these autonomous bots will help save the seas in this interesting talk by Tom Zimmerman. An unbiased AI system: Artificially designed systems  are only as good as the data being used to build them. This data may be impure, or may contain flaws or bias pertaining of color, race, gender and so on. Going forward, new models which mitigate these biases and ensure more standard, bias-free predictions will be designed. With these models, certain human values and principles will be considered for effective decision-making. IBM researcher Francesca Rossi talks about bias in AI and the importance of building fair systems that help us make better decisions. Quantum Computing will go mainstream: IBM predicts that quantum computing will get out of research labs and gain mainstream adoption in the next 5 years. Problems considered to be difficult or unsolvable today due to their sheer scale or complexity can be tackled with the help of quantum computing. To know more, let IBM researcher Talia Gershon take you through the different aspects of quantum computing and why it is expected to be a massive hit. Amazingly, most of the predictions from the past have turned out to be true. For instance, IBM predicted the rise of Computer Vision technology in 2012, where computers would be able to not only process images, but also understand their ‘features’. It remains to be seen how true this year’s predictions will turn out to be. However, considering the rate at which the research on AI and other tech domains is progressing and being put to practical use, we won’t be surprised if they all become a reality soon. What do you think?
Read more
  • 0
  • 0
  • 8733

article-image-rise-data-science
Akram Hussain
30 Jun 2014
5 min read
Save for later

The Rise of Data Science

Akram Hussain
30 Jun 2014
5 min read
The rise of big data and business intelligence has been one of the hottest topics to hit the tech world. Everybody who’s anybody has heard of the term business intelligence, yet very few can actually articulate what this means. Nonetheless it’s something all organizations are demanding. But you must be wondering why and how do you develop business intelligence? Enter data scientists! The concept of data science was developed to work with large sets of structured and unstructured data. So what does this mean? Let me explain. Data science was introduced to explore and give meaning to random sets of data floating around (we are talking about huge quantities here, that is, terabytes and petabytes), which are then used to analyze and help identify areas of poor performance, areas of improvement, and areas to capitalize on. The concept was introduced for large data-driven organisations that required consultants and specialists to deal with complex sets of data. However, data science has been adopted very quickly by organizations of all shapes and sizes, so naturally an element of flexibility would be required to fit data scientists in the modern work flow. There seems to be a shortage for data scientists and an increase in the amount of data out there. The modern data scientist is one who would be able to apply analytical skills necessary to any organization with or without large sets of data available. They are required to carry out data mining tasks to discover relevant meaningful data. Yet, smaller organizations wouldn’t have enough capital to invest in paying for a person who is experienced enough to derive such results. Nonetheless, because of the need for information, they might instead turn to a general data analyst and help them move towards data science and provide them with tools/processes/frameworks that allow for the rapid prototyping of models instead. The natural flow of work would suggest data analysis comes after data mining, and in my opinion analysis is at the heart of the data science. Learning languages like R and Python are fundamental to a good data scientist’s tool kit. However would a data scientist with a background in mathematics and statistics and little to no knowledge of R and Python still be as efficient? Now, the way I see it, data science is composed of four key topic areas crucial to achieving business intelligence, which are data mining, data analysis, data visualization, and machine learning. Data analysis can be carried out in many forms; it’s essentially looking at data and understanding it to make a factual conclusion from it (in simple terms). A data scientist may choose to use Microsoft Excel and VBA to analyze their data, but it wouldn’t be as accurate, clean, or as in depth as using Python or R, but it sure would be useful as a quick win with smaller sets of data. The approach here is that starting with something like Excel doesn’t mean it’s not counted as data science, it’s just a different form of it, and more importantly it actually gives a good foundation to progress on to using things like MySQL, R, Julia, and Python, as with time, business needs would grow and so would expectations of the level of analysis. In my opinion, a good data scientist is not one who knows more than one or two languages or tools, but one who is well-versed in the majority of them and knows which language and skill set are best suited to the task in hand. Data visualization is hugely important, as numbers themselves tell a story, but when it comes to representing the data to customers or investors, they're going to want to view all the different aspects of that data as quickly and easily as possible. Graphically representing complex data is one of the most desirable methods, but the way the data is represented varies dependent on the tool used, for example R’s GGplot2 or Python’s Matplotlib. Whether you’re working for a small organization or a huge data-driven company, data visualization is crucial. The world of artificial intelligence introduced the concept of machine learning, which has exploded on the scene and to an extent is now fundamental to large organizations. The opportunity for organizations to move forward by understanding a consumer’s behaviour and equally matching their expectations has never been so valuable. Data scientists are required to learn complex algorithms and core concepts such as classifications, recommenders, neural networks, and supervised and unsupervised learning techniques. This is just touching the edges of this exciting field, which goes into much more depth especially with emerging concepts such as deep learning.   To conclude, we covered the basic fundamentals of data science and what it means to be data scientists. For all you R and Python developers (not forgetting any mathematical wizards out there), data science has been described as the ‘Sexiest job of 21st century’  as well as being handsomely rewarding too. The rise in jobs for data scientists has without question exploded and will continue to do so; according to global management firm McKinsey & Company, there will be a shortage of 140,000 to 190,000 data scientists due to the continued rise of ‘big data’.
Read more
  • 0
  • 0
  • 8592

article-image-handpicked-weekend-reading-15th-dec-2017
Aarthi Kumaraswamy
16 Dec 2017
2 min read
Save for later

Handpicked for your Weekend Reading - 15th Dec, 2017

Aarthi Kumaraswamy
16 Dec 2017
2 min read
As you gear up for the holiday season and the year-end celebrations, make a resolution to spend a fraction of your weekends in self-reflection and in honing your skills for the coming year. Here is the best of the DataHub for your reading this weekend. Watch out for our year-end special edition in the last week of 2017! NIPS Special Coverage A deep dive into Deep Bayesian and Bayesian Deep Learning with Yee Whye Teh How machine learning for genomics is bridging the gap between research and clinical trial success by Brendan Frey 6 Key Challenges in Deep Learning for Robotics by Pieter Abbeel For the complete coverage, visit here. Experts in Focus Ganapati Hegde and Kaushik Solanki, Qlik experts from Predoole Analytics on How Qlik Sense is driving self-service Business Intelligence 3 things you should know that happened this week Generative Adversarial Networks: Google open sources TensorFlow-GAN (TFGAN) “The future is quantum” — Are you excited to write your first quantum computing code using Microsoft’s Q#? “The Blockchain to Fix All Blockchains” – Overledger, the meta blockchain, will connect all existing blockchains Try learning/exploring these tutorials weekend Implementing a simple Generative Adversarial Network (GANs) How Google’s MapReduce works and why it matters for Big Data projects How to write effective Stored Procedures in PostgreSQL How to build a cold-start friendly content-based recommender using Apache Spark SQL Do you agree with these insights/opinions Deep Learning is all set to revolutionize the music industry 5 reasons to learn Generative Adversarial Networks (GANs) in 2018 CapsNet: Are Capsule networks the antidote for CNNs kryptonite? How AI is transforming the manufacturing Industry
Read more
  • 0
  • 0
  • 8549

article-image-what-the-future-holds-for-it-support-companies
Guest Contributor
16 Sep 2018
6 min read
Save for later

What the Future Holds for IT Support Companies

Guest Contributor
16 Sep 2018
6 min read
In the technological era, many industries are finding new ways to carve out space for themselves. From healthcare to hospitality, rapid developments in tech have radically changed the way we do business, and adaptation is a must. The information technology industry holds a unique position in the world. With the changing market, IT support companies must also adapt to new technology more quickly than anyone else to ensure competitiveness. Decreased Individual Support, Increased Organizational Support Individual, discrete tech users are requiring less and less IT support than ever before. Every day, the human race produces 2.5 quintillion bytes of data – a staggering amount of information that no individual could possibly consume within a lifetime. That rate is increasing every day. With such widespread access to information, individuals are now able to find solutions with unrivaled ease and speed and the need for live, person-to-person support has decreased. Adding to the figure is the growing presence of younger generations who have grown up in a world saturated with technology. Children born in the 2000’s and later have never seen a world without smartphones, Bluetooth, and the World Wide Web. Development alongside technology not only implants a level of comfort with using technology but also with adapting to its constant changes. For the newest cohort of young adults, troubleshooting is no longer a professional task, but a household one. Alternatively, businesses require just as much support as ever. The accelerating pace of software development has opened up a new world of opportunity for organizations to optimize data management, customer support, marketing, finance, and more. But it’s also created a new, highly-competitive market where late-adopters run the risk of falling hard. Adapting to and using new information technology systems takes a highly-organized and knowledgeable support team, and that role is increasingly being outsourced outside organization walls. Companies like CITC are stepping up to provide more intensive expertise to businesses that may have, in the past, utilized in-house teams to manage IT systems.Source:Unsplash Improving Customer Service While individual tech users may need increasingly less company-provided support, they continue to expect increasingly better service. Likewise, organizations expect more from their IT support companies – faster incident response, simplified solutions, and the opportunity to troubleshoot on-the-spot. In response to these demands, IT support organizations are experimenting with modern models of support. Automation and Prediction-Based Support with Artificial Intelligence Artificial intelligence has become a part of everyday life for much of the world. From Google Assistant and Siri to self-driving cars, AI has offered a myriad of tools for streamlining daily tasks and simplifying our lives. It’s no surprise that developers are looking for ways to integrate Artificial Intelligence into IT support as well. Automated responses and prediction-based support offer a quick, cost-effective option that allows IT professionals to spend their time on tasks that require a more nuanced approach. However, automation of IT support comes with its own set of problems. First, AI lacks a human touch. Trying to discuss a potentially imminent problem with a bot can be a recipe for frustration, and the unreliability of problem self-reporting poses the additional risk of ineffective or inappropriate solutions. Automation also poses security risks which can be especially hazardous to industries with valuable trade secrets. Community-Based Support A second shift that has begun to take place in the IT support industry is a move towards community-based support. Crowdsourced solutions, like AI, can help carry some of the burden of smaller problems by allowing staff to focus energy on more pressing tasks. Unlike automated solutions, however, community-based support allows for human-to-human interaction and more seamless, collaborative, feedback-based troubleshooting. However, community-based support has limited applications. Crowdsourced solutions, contrary to the name, are often only the work of a few highly-qualified individuals. The turnaround for answers from a qualified source can be extensive, which is unacceptable under many circumstances. Companies offering a community-based support platform must moderate contributions and fact-check solutions, which can end up being nearly as resource intensive as traditional support services. While automation and community-based support offer new alternatives to traditional IT support for individual tech users, organizational support requires a much different approach. Organizations that hire IT support companies expect expert solutions, and dedicated staffing are a must. The Shift: Management to Adoption Software is advancing at a rapid pace. In the first quarter of 2018, there were 3,800,000 apps available for Android users and 2,000,000 for Apple. The breadth of enterprise software is just as large, with daily development in all industry sectors. Businesses are no longer able to adopt a system and stick with it - they must constantly adapt to new, better technology and seek out the most innovative solutions to compete in their industries. Source: Unsplash IT support companies must also increasingly dedicate themselves to this role. IT support is no longer a matter of technology management, but of adaptation and adoption. New IT companies bring a lot to the table here. Much like the newer generations of individual users, new companies can offer a fresh perspective on software developments and a more fluid ability to adapt to changes in the market. Older IT companies also have plenty to offer, however. Years of traditional support experience can be a priceless asset that inspires confidence from clients. All IT support organizations should focus on being able to adapt to the future of technology. This can be done by increasingly making rapid changes in software, an increasing reliance on digital technology, and transitioning to digital resource management. Additionally, support companies must be able to provide innovative solutions that strike an effective balance between automation and interaction. Ultimately, companies must realize that the future is already here and that traditional methods of service are no longer adequate for the changing landscape of tech. In Conclusion The world of technology is changing and, with it, the world of IT support. Support needs are rapidly shifting from customer service to enterprise support, and IT support companies must adapt to serve the needs of all industry sectors. Companies will need to find innovative solutions to not only provide management of technology but allow companies to adapt seamlessly into new technological advancements. This article is written by a student of New York City College of Technology. New York City College of Technology is a baccalaureate and associate degree-granting institution committed to providing broad access to high quality technological and professional education for a diverse urban population.
Read more
  • 0
  • 0
  • 8240

article-image-top-5-misconceptions-about-data-science
Erik Kappelman
02 Oct 2017
6 min read
Save for later

Top 5 misconceptions about data science

Erik Kappelman
02 Oct 2017
6 min read
Data science is a well-defined, serious field of study and work. But the term ‘data science’ has become a bit of a buzzword. Yes, 'data scientists’ have become increasingly important to many different types of organizations, but it has also become a trend term in tech recruitment. The fact that these words are thrown around so casually has led to a lot of confusion about what data science and data scientists actually is and are. I would formerly include myself in this group. When I first heard the word data scientist, I assumed that data science was actually just statistics in a fancy hat. Turns out I was quite wrong. So here are the top 5 misconceptions about data science. Data science is statistics and vice versa I fell prey to this particular misconception myself. What I have come to find out is that statistical methods are used in data science, but conflating the two is really inaccurate. This would be somewhat like saying psychology is statistics because research psychologists use statistical tools in studies and experiments. So what's the difference? I am of the mind that the primary difference lies in the level of understanding of computing required to succeed in each discipline. While many statisticians have an excellent understanding of things like database design, one could be a statistician and actually know nothing about database design. To succeed as a statistician, all the way up to the doctoral level, you really only need to master basic modeling tools like R, Python, and MatLab. A data scientist needs to be able to mine data from the Internet, create machine learning algorithms, design, build and query databases and so on. Data science is really computer science This is the other half of the first misconception. While it is tempting to lump data science in with computer science, the two are quite different. For one thing, computer science is technically a field of mathematics focused on algorithms and optimization, and data science is definitely not that. Data science requires many skills that overlap with those of computer scientists, but data scientists aren’t going to need to know anything about computer hardware, kernels, and the like. A data scientist ought to have some understanding of network protocols, but even here, the level of understanding required for data science is nothing like the understanding held by the average computer scientist. Data scientists are here to replace statisticians In this case, nothing could be further from the truth. One way to keep this straight is that statisticians are in the business of researching existing statistical tools as well as trying to develop new statistical tools. These tools are then turned around and used by data scientists and many others. Data scientists are usually more focused on applied solutions to real problems and less interested in what many might regard as pure research. Data science is primarily focused on big data This is an understandable misconception. Just so we’re clear, Wikipedia defines big data as “a term for data sets that are so large or complex that traditional data processing application software is inadequate to deal with them.” Then big data is really just the study of how to deal with, well, big datasets. Data science absolutely has a lot to contribute in this area. Data scientists usually have skills that work really well when it comes to analyzing big data. Skills related to databases, machine learning, and how data is transferred around a local network or the internet, are skills most data scientists have, and are very helpful when dealing with big data. But data science is actually very broad in scope. big data is a hot topic right now and receiving a lot of attention. Research into the field is receiving a lot private and public funding. In any situation like this, many different types of people working in a diverse range of areas are going to try to get in on the action. As a result, talking up data science's connection to big data makes sense if you're a data scientist - it's really about effective marketing. So, you might work with big data if you're a data scientist - but data science is also much, much more than just big data. Data scientists can easily find a job I thought I would include this one to add a different perspective. While there are many more misconceptions about what data science is or what data scientists do, I think this is actually a really damaging misconception and should be discussed. I hear a lot of complaints these days from people with some skill set that is sought after not being able to find gainful employment. Data science is like any other field, and there is always going to be a whole bunch of people that are better at it than you. Don’t become a data scientist because you’re sure to get a job - you’re not. The industries related to data science are absolutely growing right now, and will continue to do so for the foreseeable future. But that doesn’t mean people who can call themselves data scientists just automatically get jobs. You have to have the talent, but you also need to network and do all the same things you need to do to get on in any other industry. The point is, it's not easy to get a job no matter what your field is; study and practice data science because it's awesome, don’t do it because you heard it’s a sure way to get a job. Misconceptions abound, but data science is a wonderful field of research, study, and practice. If you are interested in pursuing a career or degree related to data science, I encourage you to do so, however, make sure you have the right idea about what you’re getting yourself into. Erik Kappelman wears many hats including blogger, developer, data consultant, economist, and transportation planner. He lives in Helena, Montana and works for theDepartment of Transportation as a transportation demand modeler.
Read more
  • 0
  • 0
  • 8018
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-stitch-fix-full-stack-data-science-winning-strategies
Aaron Lazar
05 Dec 2017
8 min read
Save for later

Stitch Fix: Full Stack Data Science and other winning strategies

Aaron Lazar
05 Dec 2017
8 min read
Last week, a company in San Francisco was popping bottles of champagne for their achievements. And trust me, they’re not at all small. Not even a couple of weeks gone by, since it was listed on the stock market and it has soared to over 50%. Stitch Fix is an apparel company run by co-founder and CEO, Katrina Lake. In just a span of 6 years, she’s been able to build the company with an annual revenue of a whopping $977 odd million. The company has been disrupting traditional retail and aims to bridge the gap of personalised shopping, that the former can’t accomplish. Stitch Fix is more of a personalized stylist, rather than a traditional apparel company. It works in 3 basic steps: Filling a Style Profile: Clients are prompted to fill out a style profile, where they share their style, price and size preferences. Setting a Delivery Date: The clients set a delivery date as per their availability. Stitch Fix mixes and matches various clothes from their warehouses and comes up with the top 5 clothes that they feel would best suit the clients, based on the initial style profile, as well as years of experience in styling. Keep or Send Back: The clothes reach the customer on the selected date and the customer can try on the clothes, keep whatever they like or send back what they don’t. The aim of Stitch Fix is to bring a personal touch to clothes shopping. According to Lake, “There are millions and millions of products out there. You can look at eBay and Amazon. You can look at every product on the planet, but trying to figure out which one is best for you is really the challenge” and that’s the tear Stitch Fix aims to sew up. In an interview with eMarketer, Julie Bornstein, COO of Stitch Fix said “Over a third of our customers now spend more than half of their apparel wallet share with Stitch Fix. They are replacing their former shopping habits with our service.” So what makes Stitch Fix stand out among its competitors? How do they do it? You see, Stitch Fix is not just any apparel company. It has created the perfect formula by blending human expertise with just the right amount of Data Science to enable it to serve its customers. When we’re talking about the kind of Data Science that Stitch Fix does, we’re talking about a relatively new and exciting term that’s on the rise - Full Stack Data Science. Hello Full Stack Data Science! For those of you who’ve heard of this before, cheers! I hope you’ve had the opportunity to experience its benefits. For those of you who haven’t heard of the term, Full Stack Data Science basically means a single data scientist does their own work, which is mining data, cleans it, writes an algorithm to model it and then visualizes the results, while also stepping into the shoes of an engineer, implementing the model, as well as a Project Manager, tracking the entire process and ensuring it’s on track. Now while this might sound like a lot for one person to do, it’s quite possible and practical. It’s practical because of the fact that when these roles are performed by different individuals, they induce a lot of latency into the project. Moreover, a synchronization of priorities of each individual is close to impossible, thus creating differences within the team. The Data (Science) team at Stitch Fix is broadly categorized based on what area they work on: Because most of the team focuses on full stack, there are over 80 Data Scientists on board. That’s a lot of smart people in one company! On a serious note, although unique, this kind of team structure has been doing well for them, mainly because it gives each one the freedom to work independently. Tech Treasure Trove When you open up Stitch Fix’s tech toolbox, you won’t find Aladdin’s lamp glowing before you. Their magic lies in having a simple tech stack that works wonders when implemented the right way. They work with Ruby on Rails and Bootstrap for their web applications that are hosted on Heroku. Their data platform relies on a robust Postgres implementation. Among programming languages, we found Python, Go, Java and JavaScript also being used. For an ML Framework, we’re pretty sure they’re playing with TensorFlow. But just working with these tools isn’t enough to get to the level they’re at. There’s something more under the hood. And believe it or not, it’s not some gigantic artificial intelligent system running on a zillion cores! Rather, it’s all about the smaller, simpler things in life. For example, if you have 3 different kinds of data and you need to find a relationship between them, instead of bringing in the big guns (read deep learning frameworks), a simple tensor decomposition using word vectors would do the deed quite well. Advantages galore: Food for the algorithms One of the main advantages Stitch Fix has, is that they have almost 5 years’ worth client data. This data is obtained from clients in several ways like through a Client Profile, After-Delivery Feedback, Pinterest photos, etc. All this data is put through algorithms that learn more about the likes and dislikes of clients. Some interesting algorithms that feed on this sumptuous data are on the likes of collaborative filtering recommenders to group clients based on their likes, mixed-effects modeling to learn about a client’s interests over time, neural networks to derive vector descriptions of the Pinterest images and to compare them with in-house designs, NLP to process customer feedback, Markov chain models to predict demand, among several others. A human Touch: When science meets art While the machines do all the calculations and come up with recommendations on what designs customers would appreciate, they still lack the human touch involved. Stitch Fix employs over 3000 stylists. Each client is assigned a stylist who knows the entire preference of the client at the glance of a custom-built interface. The stylist finalizes the selections from the inventory list also adding in a personal note that describes how the client can accessorize the purchased items for a particular occasion and how they can pair them with any other piece of clothing in their closet. This truly advocates “Humans are much better with the machines, and the machines are much better with the humans”. Cool, ain't it? Data Platform Apart from the Heroku platform, Stitch Fix seems to have internal SaaS platforms where the data scientists effectively carry out analysis, write algorithms and put them into production. The platforms exhibit properties like data distribution, parallelization, auto-scaling, failover, etc. This lets the data scientists focus on the science aspect while still enjoying the benefits of a scalable system. The good, the bad and the ugly: Microservices, Monoliths and Scalability Scalability is one of the most important aspects a new company needs to take into account before taking the plunge. Using a microservice architecture helps with this, by allowing small independent services/mini applications to run on their own. Stitch Fix uses this architecture to improve scalability although, their database is a monolith. They now are breaking the monolith database into microservices. This is a takeaway for all entrepreneurs just starting out with their app. Data Driven Applications Data-driven applications ensure that the right solutions are built for customers. If you’re a customer-centric organisation, there’s something you can learn from Stitch Fix. Data-Driven Apps seamlessly combine the operational and analytic capabilities of the organisation, thus breaking down the traditional silos. TDD + CD = DevOps Simplified Both Test Driven Development and Continuous Delivery go hand in hand and it’s always better to imbibe this culture right from the very start. In the end, it’s really great to see such creative and technologically driven start-ups succeed and sail to the top. If you’re on the journey to building that dream startup of yours and you need resources for your team, here’s a few books you’ll want to pick up to get started with: Hands-On Data Science and Python Machine Learning by Frank Kane Data Science Algorithms in a Week by Dávid Natingga Continuous Delivery and DevOps : A Quickstart Guide - Second Edition by Paul Swartout Practical DevOps by Joakim Verona    
Read more
  • 0
  • 0
  • 7978

article-image-5-data-science-tools-matter-2018
Richard Gall
12 Dec 2017
3 min read
Save for later

5 data science tools that will matter in 2018

Richard Gall
12 Dec 2017
3 min read
We know your time is valuable. That's why what matters is important. We've written about the trends and issues that are going to matter in data science, but here you can find 5 data science tools that you need to pay attention to in 2018. Read our 5 things that matter in data science in 2018 here. 1. TensorFlow Google's TensorFlow has been one of the biggest hits of 2017 when it comes to libraries. It’s arguably done a lot to make machine learning more accessible than ever before. That means more people actually building machine learning and deep learning algorithms, and the technology moving beyond the domain of data professionals and into other fields. So, if TensorFlow has passed you by we recommend you spend some time exploring it. It might just give your skill set the boost you’re looking for. Explore TensorFlow content here. 2.Jupyter Jupyter isn’t a new tool, sure. But it’s so crucial to the way data science is done that it’s importance can’t be understated. And as pressure is placed on data scientists and analysts to communicate and share data in ways that empower stakeholders in a diverse range of roles and departments. It’s also worth mentioning its relationship with Python - we’ve seen Python go from strength to strength throughout 2017, and showing no signs of letting up; the close relationship between the two will only serve to make Jupyter more popular across the data science world. Discover Jupyter eBooks and videos here. 3. Keras In a year when deep learning has captured the imagination, it makes sense to include both libraries helping to power it. It’s a close call between Keras and TensorFlow which deep learning framework is ‘better’ - ultimately, like everything, it’s about what you’re trying to do. This post explores the difference between Keras and TensorFlow very well - the conclusion is ultimately that while TensorFlow offers more ‘control’, Keras is the library you want if you simply need to get up and running. Both libraries have had a huge impact in 2017, and we’re only going to be seeing more of them in 2018. Learn Keras. Read Deep Learning with Keras. 4. Auto SkLearn Automated machine learning is going to become incredibly important in 2018. As pressure mounts on engineers and analysts to do more with less, tools like Auto SKLearn will be vital in reducing some of the ‘manual labour’ of algorithm selection and tuning. 5. Dask This one might be a little unexpected. We know just how popular Apache Spark is when it comes to distributed and parallel computing, but Dask represents an interesting competitor that’s worth watching throughout 2018. It’s high-level API integrates exceptionally well with Python libraries like NumPy and pandas; it’s also much more lightweight than Spark, so it could be a good option if you want to avoid building out a weighty big data tech stack. Explore Dask in the latest edition of Python High Performance.
Read more
  • 0
  • 0
  • 7858

article-image-paper-in-two-minutes-i-revnet-a-deep-invertible-convolutional-network
Sugandha Lahoti
02 Apr 2018
4 min read
Save for later

Paper in Two minutes: i-RevNet, a deep invertible convolutional network

Sugandha Lahoti
02 Apr 2018
4 min read
The ICLR 2018 accepted paper, i-RevNet: Deep Invertible Networks, introduces i-RevNet, an invertible convolutional network, that does not discard any information about the input while classifying images. This paper is authored by Jörn-Henrik Jacobsen, Arnold W.M. Smeulders, and Edouard Oyallon. The 6th annual ICLR conference is scheduled to happen between April 30 - May 03, 2018. i-RevNet, a deep invertible convolutional network What problem is the paper attempting to solve? A CNN is generally composed of a cascade of linear and nonlinear operators. These operators are very effective in classifying images of all sorts but reveal little information about the contribution of the internal representation to the classification. The learning process of a CNN works by a regular reduction of large amounts of uninformative variability in the images to reveal the essence of the visual class. However, the extent to which information is discarded is lost somewhere in the intermediate nonlinear processing steps. Also, there is a wide belief, that discarding information is essential for learning representations that generalize well to unseen data. The authors of this paper show that discarding information is not necessary and propose to explain this theory with empirical evidence. This paper also provides an understanding of the variability reduction process by proposing an invertible convolutional network. The i-RevNet does not discard any information about the input while classifying images. It has a built-in pseudo-inverse, allowing for easy inversion.  It basically uses linear and invertible operators for performing downsampling, instead of non-invertible variants like spatial pooling. Paper summary i-RevNet is an invertible deep network, which builds upon the recently introduced RevNet, where the non-invertible components of the original RevNets are replaced by invertible ones. i-RevNets retain all information about the input signal in any of their intermediate representations up until the last layer. They achieve the same performance on Imagenet compared to similar non-invertible RevNet and ResNet architectures. The above image describes the blocks of an i-RevNet. The strategy implemented by an i-RevNet consists in an alternation between additions, and nonlinear operators, while progressively down-sampling the signal operators. The pair of the final layer is concatenated through a merging operator. Using this architecture, the authors avoid the non-invertible modules of a RevNet (e.g. max-pooling or strides) which are necessary to train them in a reasonable time and are designed to build invariance w.r.t. Translation variability. Their method replaces the non-invertible modules by linear and invertible modules Sj, that can reduce the spatial resolution while maintaining the layer’s size by increasing the number of channels. Key Takeaways This work provides a solid empirical evidence that learning invertible representations does not discard any information about their input on large-scale supervised problems. i-RevNet, the invertible network proposed, is a class of CNN which is fully invertible and permits to exactly recover the input from its last convolutional layer. i-RevNets achieve the same classification accuracy in the classification of complex datasets as illustrated on ILSVRC-2012 when compared to the RevNet and ResNet architectures with a similar number of layers. The inverse network is obtained for free when training an i-RevNet, requiring only minimal adaption to recover inputs from the hidden representations. Reviewer feedback summary Overall Score: 25/30 Average Score: 8.3 Reviewers agreed the paper is a strong contribution, despite some comments about the significance of the result; i.e., why is invertibility a "surprising" property for learnability, in the sense that F(x) = {x,  phi(x)}, where phi is a standard CNN satisfies both properties: invertible and linear measurements of F producing good classification. Having said that, the reviews agreed that the paper is well written and easy to follow and considered it to be a great contribution to the ICLR conference.
Read more
  • 0
  • 0
  • 7774

article-image-what-did-big-data-deliver-in-2014
Akram Hussein
30 Dec 2014
5 min read
Save for later

What Did Big Data Deliver In 2014?

Akram Hussein
30 Dec 2014
5 min read
Big Data has always been a hot topic and in 2014 it came to play. ‘Big Data’ has developed, evolved and matured to give significant value to ‘Business Intelligence’. However there is so much more to big data than meets the eye. Understanding enormous amounts of unstructured data is not easy by any means; yet once that data is analysed and understood, organisations have started to value its importance and need. ‘Big data’ has helped create a number of opportunities which range from new platforms, tools, technologies; to improved economic performances in different industries; through development of specialist skills, job creation and business growth. Let’s do a quick recap of 2014 and on what Big Data has offered to the tech world from the perspective of a tech publisher.   Data Science The term ‘Data Science’ has been around for sometime admittedly, yet in 2014 it received a lot more attention thanks to the demands created by ‘Big Data’. Looking at Data Science from a tech publisher’s point of view, it’s a concept which has rapidly been adopted with potential for greater levels of investment and growth.   To address the needs of Big data, Data science has been split into four key categories, which are; Data mining, Data analysis, Data visualization and Machine learning. Equally we have important topics which fit inbetween those such as: Data cleaning (Munging) which I believe takes up majority of a data scientist time. The rise in jobs for data scientists has exploded in recent times and will continue to do so, according to global management firm McKinsey & Company there will be a shortage of 140,000 to 190,000 data scientists due to the continued rise of ‘big data’ and also has been described as the ‘Sexiest job of 21st century’.   Real time Analytics The competitive battle in Big data throughout 2014 was focused around how fast data could be streamed to achieve real time performance. Real-time analytics most important feature is gaining instant access and querying data as soon as it comes through. The concept is applicable to different industries and supports the growth of new technologies and ideas. Live analytics are more valuable to social media sites and marketers in order to provide actionable intelligence. Likewise Real time data is becoming increasing important with the phenomenon known as ‘the internet of things’. The ability to make decisions instantly and plan outcome in real time is possible now than before; thanks to development of technologies like Spark and Storm and NoSQL databases like the Apache Cassandra,  enable organisations to rapidly retrieve data and allow fault tolerant performance. Deep Learning Machine learning (Ml) became the new black and is in constant demand by many organisations especially new startups. However even though Machine learning is gaining adoption and improved appreciation of its value; the concept Deep Learning seems to be the one that’s really pushed on in 2014. Now granted both Ml and Deep learning might have been around for some time, we are looking at the topics in terms of current popularity levels and adoption in tech publishing. Deep learning is a subset of machine learning which refers to the use of artificial neural networks composed of many layers. The idea is based around a complex set of techniques for finding information to generate greater accuracy of data and results. The value gained from Deep learning is the information (from hierarchical data models) helps AI machines move towards greater efficiency and accuracy that learn to recognize and extract information by themselves and unsupervised! The popularity around Deep learning has seen large organisations invest heavily, such as: Googles acquisition of Deepmind for $400 million and Twitter’s purchase of Madbits, they are just few of the high profile investments amongst many, watch this space in 2015!     New Hadoop and Data platforms Hadoop best associated with big data has adopted and changed its batch processing techniques from MapReduce to what’s better known as YARN towards the end of 2013 with Hadoop V2. MapReduce demonstrated the value and benefits of large scale, distributed processing. However as big data demands increased and more flexibility, multiple data models and visual tool became a requirement, Hadoop introduced Yarn to address these problems.  YARN stands for ‘Yet-Another-Resource-Negotiator’. In 2014, the emergence and adoption of Yarn allows users to carryout multiple workloads such as: streaming, real-time, generic distributed applications of any kind (Yarn handles and supervises their execution!) alongside the MapReduce models. The biggest trend I’ve seen with the change in Hadoop in 2014 would be the transition from MapReduce to YARN. The real value in big data and data platforms are the analytics, and in my opinion that would be the primary point of focus and improvement in 2015. Rise of NoSQL NoSQL also interpreted as ‘Not Only SQL’ has exploded with a wide variety of databases coming to maturity in 2014. NoSQL databases have grown in popularity thanks to big data. There are many ways to look at data stored, but it is very difficult to process, manage, store and query huge sets of messy, complex and unstructured data. Traditional SQL systems just wouldn’t allow that, so NoSQL was created to offer a way to look at data with no restrictive schemas. The emergence of ‘Graph’, ‘Document’, ‘Wide column’ and ‘Key value store’ databases have showed no slowdown and the growth continues to attract a higher level of adoption. However NoSQL seems to be taking shape and settling on a few major players such as: Neo4j, MongoDB, Cassandra etc, whatever 2015 brings, I am sure it would be faster, bigger and better! 
Read more
  • 0
  • 0
  • 7738
article-image-why-enterprises-love-the-elastic-stack
Pravin Dhandre
31 May 2018
2 min read
Save for later

Why Enterprises love the Elastic Stack

Pravin Dhandre
31 May 2018
2 min read
Business insights has always been a hotspot by companies and with data that keep flowing, growing and becoming fat by the day, analytics need to be quicker, real-time and reliable. Analytics that can’t match up today’s data provide insights that become almost lifeless to market dynamics. The question then is, is there an analytics solution that can tackle the data hydra? Elastic Stack is your answer. It is power packed with tools like Elasticsearch, Kibana, Logstash, X-Pack and Beats that takes data from any source, in any format, and provide instant search, analysis, and visualization in real time. With over 225 million downloads, it is a clear crowd favorite. Enterprises get an addon benefit in using it as a single analytical suite or getting it integrated with other products, delivering real-time actionable insights and decisions every time. Why Enterprises love the Elastic Stack? Some of the common things that enterprises love about the Elastic Stack is its being open source platform. The next thing that IT companies enjoys is its super fast distributed search mechanism that makes your queries run faster and much efficient. Apart from this, its bundling with Kibana and Logstash makes it awesome for IT infrastructure and DevOps teams who can aggregate and analyze billions of logs with ease. Its simple and robust analysis platform provides distinct advantage over Splunk, Solr, Sphinx, Ambar and many other alternative product suites. Also, its SaaS option allows customers to perform log analytics, full text search and application monitoring over the cloud with utmost ease and reasonable pricing. Companies like Amazon, Bloomberg, Ebay, SAP, Citibank, Sony, Mozilla, Wordpress, SalesForce are already been using Elastic Stack, powering their search and analytics to combat their daily business challenges. Whether it is an educational institution, travel agency, e-commerce, or a financial institution, the Elastic stack is empowering millions of companies with real-time metrics, strong analytics, better search experience and high customer satisfaction. How to install Elasticsearch in Ubuntu and Windows How to perform Numeric Metric Aggregations with Elasticsearch CRUD (Create Read, Update and Delete) Operations with Elasticsearch
Read more
  • 0
  • 0
  • 7541

article-image-level-your-companys-big-data-resource-management
Timothy Chen
24 Dec 2015
4 min read
Save for later

Level Up Your Company's Big Data With Resource Management

Timothy Chen
24 Dec 2015
4 min read
Big data was once one of the biggest technology hypes, where tons of presentations and posts talked about how the new systems and tools allows large and complex data to be processed that traditional tools wasn't able to. While Big data was at the peak of its hype, most companies were still getting familiar with the new data processing frameworks such as Hadoop, and new databases such as HBase and Cassandra. Fast foward to now where Big data is still a popular topic, and lots of companies has already jumped into the Big data bandwagon and are already moving past the first generation Hadoop to evaluate newer tools such as Spark and newer databases such as Firebase, NuoDB or Memsql. But most companies also learn from running all of these tools, that deploying, operating and planning capacity for these tools is very hard and complicated. Although over time lots of these tools have become more mature, they are still usually running in their own independent clusters. It's also not rare to find multiple clusters of Hadoop in the same company since multi-tenant isn't built in to many of these tools, and you run the risk of overloading the cluster by a few non-critical big data jobs. Problems running indepdent Big data clusters There are a lot of problems when you run a lot of these independent clusters. One of them is monitoring and visibility, where all of these clusters have their own management tools and to integrate the company's shared monitoring and management tools is a huge challenge especially when onboarding yet another framework with another cluster. Another problem is multi-tenancy. Although having independent clusters solves the problem, another org's job can overtake the whole cluster. It still doesn't solve the problem when a bug in the Hadoop application just uses all the available resources and the pain of debugging this is horrific. A another problem is utilization, where a cluster is usually not 100% being utilized and all of these instances running in Amazon or in your datacenter are just racking up bills for doing no work. There are more major pain points that I don't have time to get into. Hadoop v2 The Hadoop developers and operators saw this problem, and in the 2nd generation of Hadoop they developed a separate resource management tool called YARN to have a single management framework that manages all of the resources in the cluster from Hadoop, enforce the resource limitations of the jobs, integrate security in the workload, and even optimize the workload by placing jobs closer to the data automatically. This solves a huge problem when operating a Hadoop cluster, and also consolidates all of the Hadoop clusters into one cluster since it allows a finer grain control over the workload and saves effiency of the cluster. Beyond Hadoop Now with the vast amount of Big data technologies that are growing in the ecosystem, there is a need to integrate a common resource management layer among all of the tools since without a single resource management system across all the frameworks we run back into the same problems as we mentioned before. Also when all these frameworks are running under one resource management platform, a lot of options for optimizations and resource scheduling are now possible. Here are some examples what could be possible with one resource management platform: With one resource management platform the platform can understand all of the cluster workload and available resources and can auto resize and scale up and down based on worklaods across all these tools. It can also resize jobs according to priority. The cluster is able to detect under utilization from other jobs and offer the slack resources to Spark batch jobs while not impacting your very important workloads from other frameworks, and maintain the same business deadlines and save a lot more cost. In the next post I'll continue to cover Mesos, which is one such resource management system and how the upcoming features in Mesos allows optimizations I mentioned to be possible. For more Big Data tutorials and analysis, visit our dedicated Hadoop and Spark pages. About the author Timothy Chen is a distributed systems engineer and entrepreneur. He works at Mesosphere and can be found on Github @tnachen.
Read more
  • 0
  • 0
  • 7224

article-image-tech-titans-acquisitions-and-regulation-trick-or-treat
Sugandha Lahoti
29 Oct 2018
5 min read
Save for later

Tech Titans, Acquisitions and Regulation - Trick or Treat?

Sugandha Lahoti
29 Oct 2018
5 min read
In probably the biggest open source acquisition ever, IBM announced that it has acquired Red Hat for $34 billion on Sunday. This is consistent with the trend of Silicon Valley giants’ increasing appetite for growth. The past few months also saw the emergence of the trillion dollar tech titans that has mesmerised even Wall Street. Apple and Amazon rose high in their stocks on their race to a $1 Trillion market cap with Google and Microsoft continuing to relentlessly chase that goal.  Even though Facebook and Twitter stocks took heavy blows thanks to the controversies surrounding their platforms, they continue to be valued a lot higher than solid stocks in other industries. Silicon Valley giants also acquired new companies and startups with the aim of capturing the market and coveted users. Microsoft acquired GitHub, and an AI startup Lobe; Alphabet, Google’s parent company helped GitLab raise $100 million in funding; Apple bought Shazam for an estimated $400 million; Cloudera and Hortonworks also merged to advance hybrid cloud development, Edge and Artificial Intelligence. These investments and acquisitions are a clear indication that companies are collaborating together to further technical advancements. Microsoft’s acquisition is also a signal that the attitude of mature Silicon Valley giants towards open source has changed significantly in recent years. However, people fear, that this embracing of open source is more about business than about values. Billion dollar acquisitions don’t exactly scream ‘free and open software’. Some also say that such acquisitions give access to the acquired company’s user base which big companies are most interested in. This issue was again brought up when EU regulators started an investigation over the concern that Apple’s acquisition of Shazam would potentially give Apple an unfair advantage over its rivals such as Spotify. This year has also been the year of questionable data harvesting practices and frequent and massive data breaches across firms, each affecting millions of users, even as tech titans raced to the $1 trillion club. 2018 opened with Facebook’s Cambridge Analytica scandal, that used Facebook’s user data to influence votes in the UK and US. Moreover, 50M facebook user accounts were compromised, a multimillion-dollar ad fraud scheme secretly tracked Android phones and 500K Google+ accounts were compromised by an undisclosed bug. In July, Timehop, a social media application also suffered a data breach with 21 million users’ data compromised. Just a few days ago, Cathay Pacific, a major Hong Kong based airlines, suffered a data breach affecting 9.4 million passengers. In September, Uber paid $148m over a data breach cover-up. Two weeks back, Pentagon also revealed a cybersecurity breach where hackers stole personal data of tens of thousands of military and civilian US Defense Department personnel. All of these events have left many users and even developers jaded. This has led to a growing ‘techlash’ that is throwing its weight on the need for tech regulation in recent times. Tech regulation in its simplest sense means the tech industry cannot be trusted to regulate itself and there must an independent entity that oversees how tech companies behave. This regulatory body would have power to formulate and implement policies and penalize those that don’t comply. Supporters of tech regulation argue that regulation can restore accountability and rebuild trust in tech. It will also make the conversation around the uses and abuses of technology more public while protecting citizens and software engineers. Tech regulation supporters also believe that regulation can bridge the gap between entrepreneurs, engineers and lawmakers. Read more: 5 reasons government should regulate technology However, tech regulation is not without pitfalls. Tech regulation may come at the cost of tech innovation. For example, user privacy and tech innovation are interlinked. Machine learning systems need more data to get better at their jobs. If more users choose to not share their data, the recommendations they get are likely to be generic at best or even irrelevant. Also, advertising revenue for tech companies might be hit by the limited opportunities to profile users. This could have adverse impact on companies’ ability to continue to innovate and provide free products for their users. There is a need to strike a delicate balance to make privacy work practically. This is the conclusion the US senate has come to as it continues to meet with industry leaders, and privacy experts to understand how to protect consumer data privacy without crippling tech innovation. Moreover, companies may may game tech regulation policies by providing users with little choice. For example they could simply deprive users of their services, should they choose to not share their data with the company. This should also be kept in mind while formulating both tech regulatory bodies and policy frameworks. Although data and security breaches are nasty tricks, they have been instrumental in opening the conversation around tech regulations and privacy policies, which if done right, may eventually make it a TREAT to users. As for tech acquisitions, they are never what they seem to be. Not only do they vary from company to company, but also have complex factors at play - people, culture, market, timing among others. It would be unfair or naive to claim tech acquisitions as purely tricks or treats. The truth lies somewhere in shades of gray. One time is clear though, funding does make the world go round! Sir Tim Berners-Lee on digital ethics and socio-technical systems at ICDPPC 2018 Gartner lists ‘Digital Ethics and Privacy’ as one of the top 10 strategic technology trends for 2019 Is Mozilla the most progressive tech organization on the planet right now?
Read more
  • 0
  • 0
  • 7148
article-image-neuroevolution-step-toward-thinking-machine
Amarabha Banerjee
16 Oct 2017
9 min read
Save for later

Neuroevolution: A step towards the Thinking Machine

Amarabha Banerjee
16 Oct 2017
9 min read
“I propose to consider the question - Can machines think?” - Alan Turing The goal for AI research has always remained the same - create a machine that has human-like decision-making capabilities based on available information. This includes the machine’s ability to analyze and process huge amounts of data and then make a meaningful inference from it. Machine learning, deep learning and other old and new paradigms in AI research are all attempts at imparting complex decision-making capabilities to machines or systems. Alan Turing’s famous test for AI has set the standards over the years for what qualifies as a smart AI i.e. a thinking machine. The imitation game is about an AI/ bot interacting with a human anonymously, in a way that the human can’t decipher the fact that it’s a machine. This not-so-trivial test has seen many adaptations over the years like the modern day Tokyo test. These tests set challenging boundaries that machines must cross to be considered capable of possessing intelligence. Neuroevolution, a few decades old theory, remodeled in a modern day format with the help of Neural and Deep Neural Networks, promises to challenge these boundaries and even break them. With neuroevolution, machines aim to solve complex problems on their own with satisfactory levels of accuracy even though they do not know how to achieve those results.   Neuroevolution: The Essence “If a wild animal habitually performs some useless activity, natural selection will favor rival individuals who instead devote time to surviving and reproducing...Ruthless utilitarianism trumps, even if it doesn’t always seem that way.” - Richard Dawkins This is the essence of Neuroevolution. But the process itself is not as simple. Just like the human evolution process, in the beginning, a set of algorithms work on a problem. The algorithms that show an inclination to solve the problem in the right way are selected for the next stage. They then undergo random minor mutations - i.e., small logical changes in the inherent algorithm structure. Next, we check whether these changes enable the algorithms to achieve the same result with better accuracy or efficiency. The successful ones then move to the next stage with further mutations introduced. This is similar to how nature did the sorting for us and humans evolved from a natural need to survive in unfamiliar situations. Since the concept uses Neural Networks, it has come to be known as Neuroevolution. Neuroevolution, in the simplest terms, is the process of “descent with modification” by which machines/systems evolve and get better at solving the problems they were built for. Backpropagation to DNN: The Evolution Neural networks are made up of nodes. These nodes function like neurons in the human brain that receive a set of inputs and generate a response based on the type, intensity, frequency etc of stimuli. A single node looks like the below illustration: An algorithm can be viewed as a node. With backpropagation, the algorithm is modified in an iterative manner - where the error generated after each pass, is fed back to the system. The algorithms (nodes) responsible for higher error contribution are identified and assigned less weight in the next pass. Thus, backpropagation is a way to assign appropriate weights to nodes by calculating error contributions of individual nodes. These nodes, when combined in different layers, form the structure of Deep Neural Networks. Deep Neural networks have separate input and output layers and a middle layer of hidden nodes which form the core of DNN. This hidden layer consists of multiple nodes like the following. In case of DNNs, as before in each iteration, the weight of the nodes are adjusted based on their accuracy. The number of iterations is a factor that varies for each DNN. As explained earlier, the system without any external stimuli continues to improve on its own. Now, where have we seen this before? Of course, this looks a lot like a simplified, miniature version of evolution! Unfit nodes are culled by reducing the weight they have in the overall output, and the ones with favorable results are encouraged, just like the natural selection. However, the only thing that is missing from this is the mutation and the ability to process mutation. This is where we introduce the mutations in the successful algorithms and let them evolve on their own. Backpropagation in DNNs doesn’t change the algorithm or it’s approach, it merely increases or decreases the algorithm’s overall contribution to the desired result. Forcing random mutations of neural and deep neural networks and then letting these mutations take shape as these neural networks together try to solve a given problem seem pretty straightforward. The point where everything starts getting messy is when different layers or neural networks start solving the given problem in their own pre-defined way. One of two things may then happen: The neural networks behave in self-contradiction and stall the overall problem-solving process. The system as such cannot take any decision and becomes dormant.     The neural networks are in some sort of agreement regarding a decision. The decision itself might be correct or incorrect. Both scenarios present us with dilemmas - how to restart a stalled process and how to achieve better decision making capability. The solution to both of situations lies in enabling the DNNs to rectify themselves first by choosing the correct algorithms. And then by mutating them with an intention to allow them to evolve and reach a decision toward achieving greater accuracy.   Here’s a look at some popular implementations of this idea. Neuroevolution in flesh and blood Cutting edge AI research giants like OpenAI backed by Elon Musk and Google DeepMind have taken the concept of neuroevolution and applied them to a bunch of deep neural networks. Both aim to evolve these algorithms in a way that the smarter ones survive and eventually create better and faster models & systems. Their approaches are however starkly different. The Google implementation Google’s way is simple - It takes a number of algorithms, divides them into groups and assigns one particular task to all. The algorithms that fare better at solving these problems are then chosen for the next stage, much like the reward and punishment system in reinforcement learning. However, the difference here is that the faster algorithms are not just chosen for the next step, but their models and parameters are tweaked slightly -  this is our way of introducing a mutation into the successful algorithms. These minor mutations then play out as these modified algorithms try to solve the given problem. Again, the better ones remain and the rest are culled out. This way, the algorithms themselves find a way to perform better and better until they are reasonably close to the desired result. The most important advantage of this process is that the algorithms keep track of their evolution process as they get smarter. A major limitation of Google’s approach is that the time taken for performing these complex computations is too high, hence the result takes time to show. Also, once the mutation kicks in, their behavior is not controlled externally - i.e., quite literally they can go berserk because of the resulting mutation - which means the process can fail even at an advanced stage. The OpenAI implementation Let’s contrast this with OpenAI’s master-worker approach to neuroevolution. OpenAI used a set of nearly 1440 algorithms to play the game of Atari and submit their scores to the master algorithm. Then, the algorithms with better scores were chosen and given a mutation and put back into the same process. In more abstract terms, the OpenAI method looks like this. A set of worker algorithms are given a certain complex problem to solve. The best scores are passed on to the master algorithm. The better algorithms are then mutated and set to perform the same tasks. The scores are again recorded and passed on to the master algorithm. This happens through multiple iterations. The master algorithm progressively eliminates the chance of failure since the master algorithm knows which algorithms to employ when given a certain problem. However, it does not know the road to success as it has access only to the final scores and not how those scores were achieved. The advantage of this approach is that better results are guaranteed, there are no cases of decision conflict and the system stalling. The flip side is that this system only knows its way through the given problem. All this effort to evolve the system to a better one will have to be repeated for a similar but different problem. The process is therefore cumbersome and lengthy. The Future with Neuroevolution Human evolution has taken millions of years to reach where we are today. Evolving AI and enabling them to pass the Turing test, or to further make them smart enough to pass a university entrance exam will require significant improvement from the current crop of AI. Amazon’s Alexa and Apple’s Siri are mere digital assistants. If we want AI driven smart systems with seamless integration of AI into our everyday life, algorithms with evolutionary characteristics are a must. Neuroevolution might hold the secret to inventing smart AIs that can ultimately propel human civilization to greater heights of development and advancement. “It seems probable that once the machine thinking method had started, it would not take long to outstrip our feeble powers...They would be able to converse with each other to sharpen their wits. At some stage, therefore, we should have to expect the machines to take control." - Alan Turing
Read more
  • 0
  • 0
  • 7117

article-image-what-you-missed-at-last-weeks-icml-2018-conference
Sugandha Lahoti
18 Jul 2018
6 min read
Save for later

What you missed at last week’s ICML 2018 conference

Sugandha Lahoti
18 Jul 2018
6 min read
The 35th International Conference on Machine Learning (ICML) 2018, took place on July 10, 2018 - July 15, 2018 in Stockholm, Sweden. ICML is one of the most anticipated conferences for every data scientist and ML practitioner and features some of the best ML researchers who come to talk about their research and discuss new ideas. It won’t be wrong to say that Deep learning and its subsets were the showstopper of this conference with a large number of research papers and AI professionals implementing it in their methods. These included sessions and paper presentations on, Gaussian Processes, -Networks and Relational Learning, Time-Series Analysis, Deep Bayesian Non-parametric Tracking, Generative Models, etc. Also, other deep learning subsets such as Representation Learning, Ranking and Preference Learning, Supervised Learning, Transfer and Multi-Task Learning, etc were heavily featured. The conference consisted of one day of tutorials (July 10), followed by three days of main conference sessions (July 11-13), followed by two days of workshops (July 14-15). Best Talks and Seminars of ICML 2018 ICML 2018 featured two informative talks dealing with the applications of Artificial Intelligence in other domains. Day 1 was inaugurated by an invited talk from Prof. Dawn Song on “AI and Security: Lessons, Challenges and Future Directions’’. She talked about the impact of AI in computer security, differential privacy techniques, and the synergy between AI, computer security, and blockchain. She also gave an overview of challenges and new techniques to enable privacy-preserving machine learning. Day 3 featured an inaugural talk by Max Welling on “Intelligence per  Kilowatt hour”, focusing on the connection between physics and AI. According to Max, in the coming future, companies will find it too expensive to run energy absorbing ML tools to power their AI engines, or the heat dissipation in edge devices will be too high to be safe. So the next frontier of AI is going to be finding the most energy efficient combination of hardware and algorithms. There were also two plenary talks. Language to Action: towards Interactive Task Learning with Physical Agents, by Joyce Chai and Building Machines that Learn and Think Like People by Josh Tenenbaum. Best Research Papers of ICML 2018 Among the many interesting research papers that were submitted to the ICML 2018 conference, here are the winners. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples by Anish Athalye, Nicholas Carlini, and David Wagner was lauded and bestowed with the Best Paper award. The paper identifies obfuscated gradients, a kind of gradient masking, as a phenomenon that leads to a false sense of security in defenses against adversarial examples. They identify the three different types of obfuscated gradients and develop attack techniques to overcome them. Delayed Impact of Fair Machine Learning by Lydia T. Liu, Sarah Dean, Esther Rolf, and Max Simchowitz also got the Best Paper award. This paper examines the circumstances where fairness criteria promotes the long-term well-being of disadvantaged groups, measured in terms of a temporal variable of interest. The paper also introduces a one-step feedback model of decision-making that exposes how decisions change the underlying population over time. Bonus: The Test of Time award Day 4 witnessed Facebook researchers Ronan Collobert and Jason Weston receiving the honorary ‘Test of Time award’ for their 2008 ICML paper, A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning. The paper proposed a single convolutional neural network that takes a sentence and outputs it’s language processing predictions. So the network can identify and distinguish part-of-speech tags, chunks, named entity tags, semantic roles, semantically similar words and the likelihood that the sentence makes sense (grammatically and semantically) using a language model. At the time of the paper publishing there was almost no neural networks research in Natural Language Processing. The paper’s use of word embeddings and how they are trained, the use of auxiliary tasks and multitasking, and the use of convolutional neural nets in NLP, really inspired the neural networks of today. For instance, Facebook’s recent machine translation and summarization tool Fairseq uses CNNs for language. AllenNLP’s Elmo learns improved word embeddings via a neural net language model and applies them to a large number of NLP tasks. Featured Tutorials at ICML 2018 The ICML 2018 featured a total of 9 tutorials in sets of 3 each. All the tutorials took place on Day 1. These included: Imitation Learning by Yisong Yue and Hoang M Le where they gave a broad overview of imitation learning techniques and its recent applications. Learning with Temporal Point Processes by Manuel Gomez Rodriguez and Isabel Valera. They talk about temporal point processes in machine learning from basics to advanced concepts such as marks and dynamical systems with jumps. Machine Learning in Automated Mechanism Design for Pricing and Auctions by Nina Balcan, Tuomas Sandholm, and Ellen Vitercik. This tutorial covered automated mechanism design for revenue maximization. Toward Theoretical Understanding of Deep Learning by Sanjeev Arora where he explained about what kind of theory may ultimately arise for deep learning with examples. Defining and Designing Fair Algorithms by Sam Corbett-Davies and Sharad Goel. They illustrated the problems that lie at the foundation of algorithmic fairness, drawing on ideas from machine learning, economics, and legal theory. Understanding your Neighbors: Practical Perspectives From Modern Analysis by Sanjoy Dasgupta and Samory Kpotufe. This tutorial aimed to cover new perspectives on k-NN, and translate new theoretical insights to a broader audience. Variational Bayes and Beyond: Bayesian Inference for Big Data by Tamara Broderick where she covered modern tools for fast, approximate Bayesian inference at scale. Machine Learning for Personalised Health by Danielle Belgrave and Konstantina Palla. This tutorial evaluated the current drivers of machine learning in healthcare and present machine learning strategies for personalised health. Optimization Perspectives on Learning to Control by Benjamin Recht where he showed how to learn models of dynamical systems, how to use data to achieve objectives in a timely fashion, how to balance model specification etc. Workshops at ICML 2018 Day 5 and 6 of the ICML 2018 conference were dedicated entirely for Workshops based on topics ranging from AI in health to AI in computational psychology to Humanizing AI to AI for Wildlife Conservation. Some other workshops included Bridging the Gap between Human and Automated Reasoning Data Science meets Optimization Domain Adaptation for Visual Understanding Eighth International Workshop on Statistical Relational AI Enabling Reproducibility in Machine Learning MLTrain@RML Engineering Multi-Agent Systems Exploration in Reinforcement Learning Federated AI for Robotics Workshop (F-Rob-2018) This is just a brief overview of the ICML conference, where we have handpicked a select few paper presentations and invited talks. You can see the full schedule along with the list of selected research papers at the ICML website. 7 of the best machine learning conferences for the rest of 2018 Microsoft start AI School to teach Machine Learning and Artificial Intelligence Google introduces Machine Learning courses for AI beginners
Read more
  • 0
  • 0
  • 6954
Modal Close icon
Modal Close icon