Tech Guides

article-image-top-5-misconceptions-about-data-science

02 Oct 2017

6 min read

Top 5 misconceptions about data science

02 Oct 2017

Data science is a well-defined, serious field of study and work. But the term ‘data science’ has become a bit of a buzzword. Yes, 'data scientists’ have become increasingly important to many different types of organizations, but it has also become a trend term in tech recruitment. The fact that these words are thrown around so casually has led to a lot of confusion about what data science and data scientists actually is and are. I would formerly include myself in this group. When I first heard the word data scientist, I assumed that data science was actually just statistics in a fancy hat. Turns out I was quite wrong. So here are the top 5 misconceptions about data science. Data science is statistics and vice versa I fell prey to this particular misconception myself. What I have come to find out is that statistical methods are used in data science, but conflating the two is really inaccurate. This would be somewhat like saying psychology is statistics because research psychologists use statistical tools in studies and experiments. So what's the difference? I am of the mind that the primary difference lies in the level of understanding of computing required to succeed in each discipline. While many statisticians have an excellent understanding of things like database design, one could be a statistician and actually know nothing about database design. To succeed as a statistician, all the way up to the doctoral level, you really only need to master basic modeling tools like R, Python, and MatLab. A data scientist needs to be able to mine data from the Internet, create machine learning algorithms, design, build and query databases and so on. Data science is really computer science This is the other half of the first misconception. While it is tempting to lump data science in with computer science, the two are quite different. For one thing, computer science is technically a field of mathematics focused on algorithms and optimization, and data science is definitely not that. Data science requires many skills that overlap with those of computer scientists, but data scientists aren’t going to need to know anything about computer hardware, kernels, and the like. A data scientist ought to have some understanding of network protocols, but even here, the level of understanding required for data science is nothing like the understanding held by the average computer scientist. Data scientists are here to replace statisticians In this case, nothing could be further from the truth. One way to keep this straight is that statisticians are in the business of researching existing statistical tools as well as trying to develop new statistical tools. These tools are then turned around and used by data scientists and many others. Data scientists are usually more focused on applied solutions to real problems and less interested in what many might regard as pure research. Data science is primarily focused on big data This is an understandable misconception. Just so we’re clear, Wikipedia defines big data as “a term for data sets that are so large or complex that traditional data processing application software is inadequate to deal with them.” Then big data is really just the study of how to deal with, well, big datasets. Data science absolutely has a lot to contribute in this area. Data scientists usually have skills that work really well when it comes to analyzing big data. Skills related to databases, machine learning, and how data is transferred around a local network or the internet, are skills most data scientists have, and are very helpful when dealing with big data. But data science is actually very broad in scope. big data is a hot topic right now and receiving a lot of attention. Research into the field is receiving a lot private and public funding. In any situation like this, many different types of people working in a diverse range of areas are going to try to get in on the action. As a result, talking up data science's connection to big data makes sense if you're a data scientist - it's really about effective marketing. So, you might work with big data if you're a data scientist - but data science is also much, much more than just big data. Data scientists can easily find a job I thought I would include this one to add a different perspective. While there are many more misconceptions about what data science is or what data scientists do, I think this is actually a really damaging misconception and should be discussed. I hear a lot of complaints these days from people with some skill set that is sought after not being able to find gainful employment. Data science is like any other field, and there is always going to be a whole bunch of people that are better at it than you. Don’t become a data scientist because you’re sure to get a job - you’re not. The industries related to data science are absolutely growing right now, and will continue to do so for the foreseeable future. But that doesn’t mean people who can call themselves data scientists just automatically get jobs. You have to have the talent, but you also need to network and do all the same things you need to do to get on in any other industry. The point is, it's not easy to get a job no matter what your field is; study and practice data science because it's awesome, don’t do it because you heard it’s a sure way to get a job. Misconceptions abound, but data science is a wonderful field of research, study, and practice. If you are interested in pursuing a career or degree related to data science, I encourage you to do so, however, make sure you have the right idea about what you’re getting yourself into. Erik Kappelman wears many hats including blogger, developer, data consultant, economist, and transportation planner. He lives in Helena, Montana and works for theDepartment of Transportation as a transportation demand modeler.

0
0
8018

article-image-stitch-fix-full-stack-data-science-winning-strategies

Aaron Lazar

05 Dec 2017

8 min read

Stitch Fix: Full Stack Data Science and other winning strategies

Aaron Lazar

05 Dec 2017

8 min read

Last week, a company in San Francisco was popping bottles of champagne for their achievements. And trust me, they’re not at all small. Not even a couple of weeks gone by, since it was listed on the stock market and it has soared to over 50%. Stitch Fix is an apparel company run by co-founder and CEO, Katrina Lake. In just a span of 6 years, she’s been able to build the company with an annual revenue of a whopping $977 odd million. The company has been disrupting traditional retail and aims to bridge the gap of personalised shopping, that the former can’t accomplish. Stitch Fix is more of a personalized stylist, rather than a traditional apparel company. It works in 3 basic steps: Filling a Style Profile: Clients are prompted to fill out a style profile, where they share their style, price and size preferences. Setting a Delivery Date: The clients set a delivery date as per their availability. Stitch Fix mixes and matches various clothes from their warehouses and comes up with the top 5 clothes that they feel would best suit the clients, based on the initial style profile, as well as years of experience in styling. Keep or Send Back: The clothes reach the customer on the selected date and the customer can try on the clothes, keep whatever they like or send back what they don’t. The aim of Stitch Fix is to bring a personal touch to clothes shopping. According to Lake, “There are millions and millions of products out there. You can look at eBay and Amazon. You can look at every product on the planet, but trying to figure out which one is best for you is really the challenge” and that’s the tear Stitch Fix aims to sew up. In an interview with eMarketer, Julie Bornstein, COO of Stitch Fix said “Over a third of our customers now spend more than half of their apparel wallet share with Stitch Fix. They are replacing their former shopping habits with our service.” So what makes Stitch Fix stand out among its competitors? How do they do it? You see, Stitch Fix is not just any apparel company. It has created the perfect formula by blending human expertise with just the right amount of Data Science to enable it to serve its customers. When we’re talking about the kind of Data Science that Stitch Fix does, we’re talking about a relatively new and exciting term that’s on the rise - Full Stack Data Science. Hello Full Stack Data Science! For those of you who’ve heard of this before, cheers! I hope you’ve had the opportunity to experience its benefits. For those of you who haven’t heard of the term, Full Stack Data Science basically means a single data scientist does their own work, which is mining data, cleans it, writes an algorithm to model it and then visualizes the results, while also stepping into the shoes of an engineer, implementing the model, as well as a Project Manager, tracking the entire process and ensuring it’s on track. Now while this might sound like a lot for one person to do, it’s quite possible and practical. It’s practical because of the fact that when these roles are performed by different individuals, they induce a lot of latency into the project. Moreover, a synchronization of priorities of each individual is close to impossible, thus creating differences within the team. The Data (Science) team at Stitch Fix is broadly categorized based on what area they work on: Because most of the team focuses on full stack, there are over 80 Data Scientists on board. That’s a lot of smart people in one company! On a serious note, although unique, this kind of team structure has been doing well for them, mainly because it gives each one the freedom to work independently. Tech Treasure Trove When you open up Stitch Fix’s tech toolbox, you won’t find Aladdin’s lamp glowing before you. Their magic lies in having a simple tech stack that works wonders when implemented the right way. They work with Ruby on Rails and Bootstrap for their web applications that are hosted on Heroku. Their data platform relies on a robust Postgres implementation. Among programming languages, we found Python, Go, Java and JavaScript also being used. For an ML Framework, we’re pretty sure they’re playing with TensorFlow. But just working with these tools isn’t enough to get to the level they’re at. There’s something more under the hood. And believe it or not, it’s not some gigantic artificial intelligent system running on a zillion cores! Rather, it’s all about the smaller, simpler things in life. For example, if you have 3 different kinds of data and you need to find a relationship between them, instead of bringing in the big guns (read deep learning frameworks), a simple tensor decomposition using word vectors would do the deed quite well. Advantages galore: Food for the algorithms One of the main advantages Stitch Fix has, is that they have almost 5 years’ worth client data. This data is obtained from clients in several ways like through a Client Profile, After-Delivery Feedback, Pinterest photos, etc. All this data is put through algorithms that learn more about the likes and dislikes of clients. Some interesting algorithms that feed on this sumptuous data are on the likes of collaborative filtering recommenders to group clients based on their likes, mixed-effects modeling to learn about a client’s interests over time, neural networks to derive vector descriptions of the Pinterest images and to compare them with in-house designs, NLP to process customer feedback, Markov chain models to predict demand, among several others. A human Touch: When science meets art While the machines do all the calculations and come up with recommendations on what designs customers would appreciate, they still lack the human touch involved. Stitch Fix employs over 3000 stylists. Each client is assigned a stylist who knows the entire preference of the client at the glance of a custom-built interface. The stylist finalizes the selections from the inventory list also adding in a personal note that describes how the client can accessorize the purchased items for a particular occasion and how they can pair them with any other piece of clothing in their closet. This truly advocates “Humans are much better with the machines, and the machines are much better with the humans”. Cool, ain't it? Data Platform Apart from the Heroku platform, Stitch Fix seems to have internal SaaS platforms where the data scientists effectively carry out analysis, write algorithms and put them into production. The platforms exhibit properties like data distribution, parallelization, auto-scaling, failover, etc. This lets the data scientists focus on the science aspect while still enjoying the benefits of a scalable system. The good, the bad and the ugly: Microservices, Monoliths and Scalability Scalability is one of the most important aspects a new company needs to take into account before taking the plunge. Using a microservice architecture helps with this, by allowing small independent services/mini applications to run on their own. Stitch Fix uses this architecture to improve scalability although, their database is a monolith. They now are breaking the monolith database into microservices. This is a takeaway for all entrepreneurs just starting out with their app. Data Driven Applications Data-driven applications ensure that the right solutions are built for customers. If you’re a customer-centric organisation, there’s something you can learn from Stitch Fix. Data-Driven Apps seamlessly combine the operational and analytic capabilities of the organisation, thus breaking down the traditional silos. TDD + CD = DevOps Simplified Both Test Driven Development and Continuous Delivery go hand in hand and it’s always better to imbibe this culture right from the very start. In the end, it’s really great to see such creative and technologically driven start-ups succeed and sail to the top. If you’re on the journey to building that dream startup of yours and you need resources for your team, here’s a few books you’ll want to pick up to get started with: Hands-On Data Science and Python Machine Learning by Frank Kane Data Science Algorithms in a Week by Dávid Natingga Continuous Delivery and DevOps : A Quickstart Guide - Second Edition by Paul Swartout Practical DevOps by Joakim Verona

0
0
7978

article-image-5-data-science-tools-matter-2018

Richard Gall

12 Dec 2017

3 min read

5 data science tools that will matter in 2018

Richard Gall

12 Dec 2017

3 min read

We know your time is valuable. That's why what matters is important. We've written about the trends and issues that are going to matter in data science, but here you can find 5 data science tools that you need to pay attention to in 2018. Read our 5 things that matter in data science in 2018 here. 1. TensorFlow Google's TensorFlow has been one of the biggest hits of 2017 when it comes to libraries. It’s arguably done a lot to make machine learning more accessible than ever before. That means more people actually building machine learning and deep learning algorithms, and the technology moving beyond the domain of data professionals and into other fields. So, if TensorFlow has passed you by we recommend you spend some time exploring it. It might just give your skill set the boost you’re looking for. Explore TensorFlow content here. 2.Jupyter Jupyter isn’t a new tool, sure. But it’s so crucial to the way data science is done that it’s importance can’t be understated. And as pressure is placed on data scientists and analysts to communicate and share data in ways that empower stakeholders in a diverse range of roles and departments. It’s also worth mentioning its relationship with Python - we’ve seen Python go from strength to strength throughout 2017, and showing no signs of letting up; the close relationship between the two will only serve to make Jupyter more popular across the data science world. Discover Jupyter eBooks and videos here. 3. Keras In a year when deep learning has captured the imagination, it makes sense to include both libraries helping to power it. It’s a close call between Keras and TensorFlow which deep learning framework is ‘better’ - ultimately, like everything, it’s about what you’re trying to do. This post explores the difference between Keras and TensorFlow very well - the conclusion is ultimately that while TensorFlow offers more ‘control’, Keras is the library you want if you simply need to get up and running. Both libraries have had a huge impact in 2017, and we’re only going to be seeing more of them in 2018. Learn Keras. Read Deep Learning with Keras. 4. Auto SkLearn Automated machine learning is going to become incredibly important in 2018. As pressure mounts on engineers and analysts to do more with less, tools like Auto SKLearn will be vital in reducing some of the ‘manual labour’ of algorithm selection and tuning. 5. Dask This one might be a little unexpected. We know just how popular Apache Spark is when it comes to distributed and parallel computing, but Dask represents an interesting competitor that’s worth watching throughout 2018. It’s high-level API integrates exceptionally well with Python libraries like NumPy and pandas; it’s also much more lightweight than Spark, so it could be a good option if you want to avoid building out a weighty big data tech stack. Explore Dask in the latest edition of Python High Performance.

0
0
7858

article-image-fab-lab-worldwide-labs-digital-fabrication-0

Michael Ang

30 Jan 2015

7 min read

Fab Lab: worldwide labs for digital fabrication

Michael Ang

30 Jan 2015

7 min read

Looking for somewhere to get started using 3D printing, laser cutting, and digital fabrication? The world-wide Fab Lab network provides access to digital fabrication tools based on the ideas of open access, DIY, and knowledge sharing. The Fab Lab initiative was started by MIT’s Center for Bits and Atoms and now boasts affiliated labs in many countries. I first visited Fab Lab Berlin for their "Build your own 3D printer" workshop. Over the course of one weekend I assembled an i3 Berlin 3D printer from a kit, working alongside the kit’s designers and other people interested in building their own printers. Assembling a 3D printer can be a daunting task, but working as a group made it fast and fun - the lab provides access not just to tools but a community of people excited about digital fabrication. I spoke with Wolf Jeschonnek, founder of Fab Lab Berlin. Wolf spent three months visiting Fab Labs and maker spaces in the United States (he blogged his adventure) before returning to Berlin to start his own Fab Lab. For someone that’s not familiar, what is a Fab Lab? The original idea by MIT was an outreach project to show the general public what the Center for Bits and Atoms was doing, which was digital fabrication. And instead of doing an exhibition or a website they put the most important and most commonly used digital fabrication tools that they were using for research into a space and made them accessible to anyone. That’s the whole idea, basically. [Fab Labs follow a simple charter of rules and generally use a shared core set of machines and software so that projects and experience can be shared between the labs. They also provide free or in-kind access to the lab at least part of the time each week.] Machines offered at Fab Lab Berlin You could say the core idea of a Fab Lab is open access to tools for digital fabrication and the sharing of knowledge and experience. Yes, it’s an open research and development lab, basically. Every university, every art or architecture university has a fab lab. There’s not much difference in the equipment, but it’s not accessible - in most of the universities I experienced it’s not even accessible to the students. They’re very strict about rules and safety and so on. So the innovation is to make those kinds of labs accessible to anyone who’s interested. It seems like another big difference is that you get hands-on access to the machines. You get to operate the lasercutter - you get to operate the CNC router. Yes, and also to pass that hands-on knowledge on to the next person who wants to use the lasercutter. In our lab we do a lot of introduction workshops but you can as well just find someone who is willing to show you how the lasercutter works. We have a couple of formal training opportunities for beginners, but after that we don’t do much advanced training, it all happens by people passing their knowledge on to someone else or figuring it out themselves. In the beginning I was the expert for every machine, because I chose them and knew which machines to buy. By now I’m not the expert at all. If you have a very specific question for technology I’m not the person to ask anymore, because the community is much better. What are some of your favorite projects that you’ve seen come out of Fab Lab Berlin? I really like the i3 Berlin 3D printer, because we started together with them, and I could see the development of the machine over the last year and a half. I also built one myself. It was the first machine that we built in the Fab Lab. Bram de Vries with several iterations of his i3 Berlin 3D printer There’s a steadicam rig for a camera that two people built. [The CAMBAL by SeeYa Film.] It’s a very sophisticated motor driven steadicam rig. It’s very steady because the motors have an active control of the movements. Then there is a project that wasn’t really developed in Fab Lab Berlin but it also started with an i3 Berlin workshop. One guy who came and built an i3 Berlin used that printer to design and make a DIY lasercutter. It’s called Mr. Beam. If you look at the machine, the Mr. Beam, you see lots of details that he took from the printer and transferred to the lasercutter. He used the printer that he made himself to build the lasercutter and it was very successful on Kickstarter. CAMBAL camera stabilizer developed at Fab Lab Berlin. Dig that carbon fibre and milled aluminum! Do you use open source tools in the Fab Lab? Is there a preference for that? We try to use as much open source as possible, because we try to enable people to use their own computers to work. It makes a lot of sense to use free software, because obviously you don’t have to pay for it and everybody can install them on their own computers. We try to use open source as much as possible, but sometimes if you do more advanced stuff it makes sense to use something else. What kind of people do you find using the lab? That’s very hard to summarize because it’s very broad. There are doctors and lawyers and computer engineers, but also teachers or just people who are interested in how a 3D printer works. The age range is between 6 and 70, I would probably guess. There are some professionals who use the lab and the infrastructure to work on professional projects then there are also people who are very professional in other fields and do very high-level projects, but for a hobby. The reasons why people come to the lab are very different. We also get a lot of interest from large companies and corporations who are interested in how innovation works in an environment like this. It’s been a very good mix of people and companies so far. Rapid Robots workshop by Niklas Roy / School of Machines, Making & Make-Believe Do you have any advice for people who are coming to Fab Lab for the first time? One advice is to bring something to make, because otherwise it’s very boring. It can still be interesting to talk to people, but if you have a project and you already prepared something, you in most cases can find someone who can help you. Then you can make something and it’s a much different experience compared to just watching. When I toured the United States [visiting Fab Labs] I brought a project - a vertical axis wind turbine. Before I went I really thought hard of a project that I could do. Most people that hang around in a Fab Lab are people who are interested in making stuff. Also talking and sharing knowledge, but the core activity is really making things. If you bring something that’s interesting it’s also a very good starter for conversation. Thanks, Wolf! The Fab Foundation maintains a world-wide list of Fab Labs. Fab Lab Berlin has open lab hours each week as well as a number of introductory workshops to get you started with digital fabrication. Most other labs have similar programs. If there isn’t a lab near you, you could be like Wolf and start your own! About the author: Michael Ang is a Berlin-based artist and engineer working at the intersection of art, engineering, and the natural world. His latest project is the Polygon Construction Kit, a toolkit for bridging the virtual and physical worlds by translating simple 3D models into physical structures.

0
0
7841

Tech Guides

article-image-application-flow-generators

Wesley Cho

07 Oct 2015

5 min read

Application Flow With Generators

Wesley Cho

07 Oct 2015

5 min read

Oftentimes, developers like to fall back to using events to enforce the concept of a workflow, a rigid diagram of business logic that branches according to the application state and/or user choices (or in psuedo-formal terms, a tree-like uni-directional flow graph). This graph may contain circular flows until the user meets the criteria to continue. One example of this is user authentication to access an application, where a natural circular logic arises of returning back to the login form until the user submits a correct user/password combination - upon login, the application may decide to display a bunch of welcome messages pointing to various pieces of functionality & giving quick explanations. However, eventing has a problem - it doesn’t centralize the high level business logic with obvious branching, so in an application of moderate complexity, developers may scramble to figure out what callbacks are supposed to trigger when, and in what order. Even worse, the logic can be split across multiple files, creating a multifile spaghetti that makes it hard to find the meatball (or other preferred food of reader interest). It can be a taxing operation for developers, which in turn hampers productivity. Enter Generators Generators are an exciting new feature in ES6 that is mysterious to more inexperienced developers. They allow one to create a natural loop that blocks up until the yielded expression, which has some natural applications such as in pagination for handling infinite scrolling lists such as a Facebook newsfeed or Twitter feed. They are currently available in Chrome (and thus io.js, as well as Node.js via --harmony flags) & Firefox, but can be used in other browsers & io.js/Node.js through transpilation of JavaScript from ES6 to ES5 with excellent transpilers such as Traceur or Babel. If you want generator functionality, you can use regenerator. One nice feature of generators is that it is a central source of logic, and thus naturally fits the control structure we would like for encapsulating high level application logic. Here is one example of how one can use generators along with promises: var User = function () { … }; User.authenticate = function* authenticate() { var self = this; while (!this.authenticated) { yield function login(user, password) { return self.api.login(user, password) .then(onSuccess, onFailure); }; } }; function* InitializeApp() { yield* User.authenticate(); yield Router.go(‘home’); } var App = InitializeApp(); var Initialize = { next: function (step, data) { switch step { case ‘login’: return App.next(); ... } ... } }; Here we have application logic that first tries to authenticate the user, then we can implement a login method elsewhere in the application: function login (user, password) { return App.next().value(user, password) .then(function success(data) { return Initialize.next(‘login’, data); }, function failure(data) { return handleLoginError(data); }); } Note the role of the Initialize object - it has a next key whose value is a function that determines what to do next in tandem with App, the “instantiation” of a new generator. In addition, it also makes use of the fact that what we choose to yield with a second generator, which lets us yield a function which can be used to pass data to attempt to login a user. On success, it will set the authenicated flag as true, which in turn will break the user out of the User.authenticate part of the InitializeApp generator, and into the next step, which in this case is to route the user to the homepage. In this case, we are blocking the user from normally navigating to the homepage upon application boot until they complete the authentication step. Explanation of differences The important piece in this code is the InitializeApp generator. Here we have a centralized control structure that clearly displays the high level flow of the application logic. If one knows that there is a particular piece that needs to be modified due to a bug, such as in the authentication piece, it becomes obvious that one must start looking at User.authenticate, and any piece of code that is directly concerned with executing it. This allows the methods to be split off into the appropriate sectors of the codebase, similarly with event listeners & the callbacks that get fired on reception of the event, except we have the additional benefit of seeing what the high level application state is. If you aren't interested in using generators, this can be replicated with promises as well. There is a caveat to this approach though - using generators or promises hardcodes the high level application flow. It can be modified, but these control structures are not as flexible as events. This does not negate the benefits of using events, but it gives another powerful tool to help make long term maintenance easier when designing an application that potentially may be maintained by multiple developers who have no prior knowledge. Conclusion Generators have many use cases that most developers working with JavaScript are not familiar with currently, and it is worth taking the time to understand how they work. Combining them with existing concepts allow some unique patterns to arise that can be of immense use when architecting a codebase, especially with possibilities such as encapsulating branching logic into a more readable format. This should help reduce mental burden, and allow developers to focus on building out rich web applications without the overhead of worrying about potentially missing business logic. About the Author Wesley Cho is a senior frontend engineer at Jiff (http://www.jiff.com/). He has contributed features & bug fixes and reported numerous issues to numerous libraries in the Angular ecosystem, including AngularJS, Ionic, UI Bootstrap, and UI Router, as well as authored several libraries.

0
0
7825

article-image-paper-in-two-minutes-i-revnet-a-deep-invertible-convolutional-network

Sugandha Lahoti

02 Apr 2018

4 min read

Paper in Two minutes: i-RevNet, a deep invertible convolutional network

Sugandha Lahoti

02 Apr 2018

4 min read

The ICLR 2018 accepted paper, i-RevNet: Deep Invertible Networks, introduces i-RevNet, an invertible convolutional network, that does not discard any information about the input while classifying images. This paper is authored by Jörn-Henrik Jacobsen, Arnold W.M. Smeulders, and Edouard Oyallon. The 6th annual ICLR conference is scheduled to happen between April 30 - May 03, 2018. i-RevNet, a deep invertible convolutional network What problem is the paper attempting to solve? A CNN is generally composed of a cascade of linear and nonlinear operators. These operators are very effective in classifying images of all sorts but reveal little information about the contribution of the internal representation to the classification. The learning process of a CNN works by a regular reduction of large amounts of uninformative variability in the images to reveal the essence of the visual class. However, the extent to which information is discarded is lost somewhere in the intermediate nonlinear processing steps. Also, there is a wide belief, that discarding information is essential for learning representations that generalize well to unseen data. The authors of this paper show that discarding information is not necessary and propose to explain this theory with empirical evidence. This paper also provides an understanding of the variability reduction process by proposing an invertible convolutional network. The i-RevNet does not discard any information about the input while classifying images. It has a built-in pseudo-inverse, allowing for easy inversion. It basically uses linear and invertible operators for performing downsampling, instead of non-invertible variants like spatial pooling. Paper summary i-RevNet is an invertible deep network, which builds upon the recently introduced RevNet, where the non-invertible components of the original RevNets are replaced by invertible ones. i-RevNets retain all information about the input signal in any of their intermediate representations up until the last layer. They achieve the same performance on Imagenet compared to similar non-invertible RevNet and ResNet architectures. The above image describes the blocks of an i-RevNet. The strategy implemented by an i-RevNet consists in an alternation between additions, and nonlinear operators, while progressively down-sampling the signal operators. The pair of the final layer is concatenated through a merging operator. Using this architecture, the authors avoid the non-invertible modules of a RevNet (e.g. max-pooling or strides) which are necessary to train them in a reasonable time and are designed to build invariance w.r.t. Translation variability. Their method replaces the non-invertible modules by linear and invertible modules Sj, that can reduce the spatial resolution while maintaining the layer’s size by increasing the number of channels. Key Takeaways This work provides a solid empirical evidence that learning invertible representations does not discard any information about their input on large-scale supervised problems. i-RevNet, the invertible network proposed, is a class of CNN which is fully invertible and permits to exactly recover the input from its last convolutional layer. i-RevNets achieve the same classification accuracy in the classification of complex datasets as illustrated on ILSVRC-2012 when compared to the RevNet and ResNet architectures with a similar number of layers. The inverse network is obtained for free when training an i-RevNet, requiring only minimal adaption to recover inputs from the hidden representations. Reviewer feedback summary Overall Score: 25/30 Average Score: 8.3 Reviewers agreed the paper is a strong contribution, despite some comments about the significance of the result; i.e., why is invertibility a "surprising" property for learnability, in the sense that F(x) = {x, phi(x)}, where phi is a standard CNN satisfies both properties: invertible and linear measurements of F producing good classification. Having said that, the reviews agreed that the paper is well written and easy to follow and considered it to be a great contribution to the ICLR conference.

0
0
7774

article-image-what-did-big-data-deliver-in-2014

Akram Hussein

30 Dec 2014

5 min read

What Did Big Data Deliver In 2014?

Akram Hussein

30 Dec 2014

5 min read

Big Data has always been a hot topic and in 2014 it came to play. ‘Big Data’ has developed, evolved and matured to give significant value to ‘Business Intelligence’. However there is so much more to big data than meets the eye. Understanding enormous amounts of unstructured data is not easy by any means; yet once that data is analysed and understood, organisations have started to value its importance and need. ‘Big data’ has helped create a number of opportunities which range from new platforms, tools, technologies; to improved economic performances in different industries; through development of specialist skills, job creation and business growth. Let’s do a quick recap of 2014 and on what Big Data has offered to the tech world from the perspective of a tech publisher. Data Science The term ‘Data Science’ has been around for sometime admittedly, yet in 2014 it received a lot more attention thanks to the demands created by ‘Big Data’. Looking at Data Science from a tech publisher’s point of view, it’s a concept which has rapidly been adopted with potential for greater levels of investment and growth. To address the needs of Big data, Data science has been split into four key categories, which are; Data mining, Data analysis, Data visualization and Machine learning. Equally we have important topics which fit inbetween those such as: Data cleaning (Munging) which I believe takes up majority of a data scientist time. The rise in jobs for data scientists has exploded in recent times and will continue to do so, according to global management firm McKinsey & Company there will be a shortage of 140,000 to 190,000 data scientists due to the continued rise of ‘big data’ and also has been described as the ‘Sexiest job of 21st century’. Real time Analytics The competitive battle in Big data throughout 2014 was focused around how fast data could be streamed to achieve real time performance. Real-time analytics most important feature is gaining instant access and querying data as soon as it comes through. The concept is applicable to different industries and supports the growth of new technologies and ideas. Live analytics are more valuable to social media sites and marketers in order to provide actionable intelligence. Likewise Real time data is becoming increasing important with the phenomenon known as ‘the internet of things’. The ability to make decisions instantly and plan outcome in real time is possible now than before; thanks to development of technologies like Spark and Storm and NoSQL databases like the Apache Cassandra, enable organisations to rapidly retrieve data and allow fault tolerant performance. Deep Learning Machine learning (Ml) became the new black and is in constant demand by many organisations especially new startups. However even though Machine learning is gaining adoption and improved appreciation of its value; the concept Deep Learning seems to be the one that’s really pushed on in 2014. Now granted both Ml and Deep learning might have been around for some time, we are looking at the topics in terms of current popularity levels and adoption in tech publishing. Deep learning is a subset of machine learning which refers to the use of artificial neural networks composed of many layers. The idea is based around a complex set of techniques for finding information to generate greater accuracy of data and results. The value gained from Deep learning is the information (from hierarchical data models) helps AI machines move towards greater efficiency and accuracy that learn to recognize and extract information by themselves and unsupervised! The popularity around Deep learning has seen large organisations invest heavily, such as: Googles acquisition of Deepmind for $400 million and Twitter’s purchase of Madbits, they are just few of the high profile investments amongst many, watch this space in 2015! New Hadoop and Data platforms Hadoop best associated with big data has adopted and changed its batch processing techniques from MapReduce to what’s better known as YARN towards the end of 2013 with Hadoop V2. MapReduce demonstrated the value and benefits of large scale, distributed processing. However as big data demands increased and more flexibility, multiple data models and visual tool became a requirement, Hadoop introduced Yarn to address these problems. YARN stands for ‘Yet-Another-Resource-Negotiator’. In 2014, the emergence and adoption of Yarn allows users to carryout multiple workloads such as: streaming, real-time, generic distributed applications of any kind (Yarn handles and supervises their execution!) alongside the MapReduce models. The biggest trend I’ve seen with the change in Hadoop in 2014 would be the transition from MapReduce to YARN. The real value in big data and data platforms are the analytics, and in my opinion that would be the primary point of focus and improvement in 2015. Rise of NoSQL NoSQL also interpreted as ‘Not Only SQL’ has exploded with a wide variety of databases coming to maturity in 2014. NoSQL databases have grown in popularity thanks to big data. There are many ways to look at data stored, but it is very difficult to process, manage, store and query huge sets of messy, complex and unstructured data. Traditional SQL systems just wouldn’t allow that, so NoSQL was created to offer a way to look at data with no restrictive schemas. The emergence of ‘Graph’, ‘Document’, ‘Wide column’ and ‘Key value store’ databases have showed no slowdown and the growth continues to attract a higher level of adoption. However NoSQL seems to be taking shape and settling on a few major players such as: Neo4j, MongoDB, Cassandra etc, whatever 2015 brings, I am sure it would be faster, bigger and better!

0
0
7738

article-image-openstack-jack-all-trades-master-none-cloud-solution

Richard Gall

06 Oct 2015

4 min read

Is OpenStack a ‘Jack of All Trades, Master of None’ Cloud Solution?

Richard Gall

06 Oct 2015

4 min read

OpenStack looks like all things to all people. Casually perusing their website you can see that it emphasises the cloud solution’s high-level features – ‘compute’, ‘storage’ and ‘networking’. Each one of these is aimed at different types of users, from development teams to sysadmins. Its multi-faceted nature makes it difficult to define OpenStack – is it, we might ask, Infrastructure or Software as a Service? And while OpenStack’s scope might look impressive on paper, ticking the box marked ‘innovative’, when it comes actually choosing a cloud platform and developing a relevant strategy, it begins to look like more of a risk. Surely we’d do well to remember the axiom ‘jack of all trades, master of none’? Maybe if you’re living in 2005 – even 2010. But in 2015, if you’re frightened of the innovation that OpenStack offers (and, for those of you really lagging behind, cloud in general) you’re missing the bigger picture. You’re ignoring the fact that true opportunities for innovation and growth don’t simply lie in faster and more powerful tools, but instead in more efficient and integrated ways of working. Yes, this might require us to rethink the ways we understand job roles and how they fit together – but why shouldn’t technology challenge us to be better, rather than just make us incrementally lazier? OpenStack’s multifaceted offering isn’t simply an experiment that answers the somewhat masturbatory question ‘how much can we fit into a single cloud solution’, but is in fact informed (unconsciously, perhaps) by an agile philosophy. What’s interesting about OpenStack – and why it might be said to lead the way when it comes to cloud – is what it makes possible. Small businesses and startups have already realized this (although this is likely through simple necessity as much as strategic decision making considering how much cloud solutions can cost), but it’s likely that larger enterprises will soon be coming to the same conclusion. And why should we be surprised? Is the tiny startup really that different from the fortune 500 corporation? Yes, larger organizations have legacy issues – with both technology and culture – but this is gradually growing out, as we enter a new era where we expect something more dynamic from the tools and platforms we use. Which is where OpenStack fits in – no longer a curiosity or a ‘science project’, it is now the standard to which other cloud platforms are held. That it offers such impressive functionality for free (at a basic level) means that those enterprise solutions that once had a hold over the marketplace will now have to play catch up. It’s likely they’ll be using the innovations of OpenStack as a starting point for their own developments – which they will be hoping keep them ‘cutting-edge’ in the eyes of customers. If OpenStack is going to be the way forward and the wider technical and business communities (whoever they are exactly) are to embrace it with open arms, it means there will need to be a cultural change in how we use it. OpenStack might well be the Jack of all trades and master of all when it comes to cloud, but it nevertheless places an emphasis on users to use it in the ‘correct’ way. That’s not to say that there is a correct way – it’s more about using it strategically and thinking about what you want from OpenStack. CloudPro articulates this in a good way, arguing that OpenStack needs a ‘benevolent dictator’. ‘Are too many cooks spoiling the open-source broth?’ it asks, getting to the central problem with all open-source technologies – the existentially troubling fact that the possibilities are seemingly endless. This point doesn’t mean we need to step away from the collaborative potential of OpenStack; it emphasises that effective collaboration requires an effective and clear strategy. Orchestration is the aim of the game – whether you’re talking software or people. At this year’s OpenStack conference in Vancouver, COO Mark Collier described OpenStack as ‘an agnostic integration engine… one that puts users in the best position for success’. This is a great way to define OpenStack, and positions it as more than just a typical cloud platform. It is its agnosticism that is particularly crucial – it doesn’t take a position on what you do, it rather makes it possible for you to do what you think you need to do. Maybe, then, OpenStack is a jack of all trades that lets you become the master. For more OpenStack tutorials and extra content, visit our dedicated page. Find it here.

0
0
7728

article-image-after-angular-take-the-meteor-challenge

Sarah C

28 Nov 2014

4 min read

What’s next after Angular? Take the Meteor challenge!

Sarah C

28 Nov 2014

4 min read

This month the Meteor framework hit version 1.0. We’ve been waiting to see this for a while here at Packt, and have definitely not been disappointed. Meteor celebrated their launch with a bang – Meteor Day saw old hands and n00bs from around the globe gather together to try out the software and build new things. You might have experienced the reverberations across the Web. Was it a carefully crafted and clever bit of marketing? Obviously. But in Meteor’s case, we can forgive a little fanfare. Maybe you’re jaded and worn out with a barrage of new tools for web development. You should make an exception for Meteor. Maybe JavaScript isn’t your thing, and you don’t have any interest in working with Node on the backend. You should make an exception for Meteor. I’m not trying to shill anything here – every resource I’ll mention in the course of this post is entirely free. I just think the Meteor web application stack is something special. Why does Meteor matter for a modern Web? If you haven’t come across it before, Meteor is a full stack JavaScript framework for the modern Web. It’s agnostic about how you want to structure your app – MVC, MVVM, MVW, stick everything in one folder with filenames such as TestTemplate(2).js –hey, man, you do you! As long as you keep your client and server concerns separate (there are special built-in rules for the client, server, and public folders to help it do its synchronous magic), Meteor won’t judge. The framework’s clarion cry is that creating application software should be radically simple . We all know that the Web looks different now than it did even a couple of years ago. The app is queen. Single-page web apps have made the Internet programmatic and reactive. The proliferation of mobile apps redefining the online path between customers and businesses are moving us even further away from treating the Internet as a static point of reference. “Pages” are a less and less an accurate metaphor for the visualization of our shared digital realm. Today’s Internet is deep, receptive, active, and aware. Given that, it’s hard to argue against making JavaScript app development simpler. Simple doesn’t mean shoddy, or hacky. It comes from thinking about the Web as it exists now and making the right demands of a framework. Meteor.js lives its philosophy – a multi-user, real-time web-app can be put together in a couple of hours with time to spare for pretty UI design and to window shop for packages. Don’t believe me? Try it out for yourself! Throwing down the gauntlet Originally, we wanted to do a Meteor challenge for the staff here in our Birmingham offices. The winner would have gotten something sweet – perhaps an extra turn on the water slide or an exemption from her turn feeding the Packt scorpions. Alas, in the end the obligation to get on with our actual jobs (helping you guys learn software) got in the way of making this happen. So I’m outsourcing the challenge to you, dear reader. Your mission: Download Meteor 1.0. Prototype an app. Use the time left over to feel pleased with yourself. You get extra credit if: The app has a particular appeal for book lovers (like us!) or It contains a good pun If you’re a Linux or Mac user you can get started right away. If you’re on Windows, you’ll need to use a virtual environment, either in your browser or using something like Vagrant. Don’t worry, the Meteor site has tutorials to get you started in a trice. After that, you can check out all kinds of great learning resources made available by the devs and the community. Get started with the official docs and tutorial, then move on to more hardcore tips and tricks at BulletProof Meteor. The more aurally inclined and those of you who like to code while you drive might prefer to check out the Meteor Podcast. (Please do not code while you drive! – The Legal Team.) When you get stuck, hit up the community on the G+ group. Or browse MeteorHelp for a collation of other sources of information. Most importantly, let me know how you get on with it! We’re excited to see what you come up with. Do you see yourself making Meteor part of your workflow in future? Check out our JavaScript Tech Page for more insight into Meteor and full-stack JS development.

0
0
7636

article-image-why-enterprises-love-the-elastic-stack

Pravin Dhandre

31 May 2018

2 min read

Why Enterprises love the Elastic Stack

Pravin Dhandre

31 May 2018

2 min read

Business insights has always been a hotspot by companies and with data that keep flowing, growing and becoming fat by the day, analytics need to be quicker, real-time and reliable. Analytics that can’t match up today’s data provide insights that become almost lifeless to market dynamics. The question then is, is there an analytics solution that can tackle the data hydra? Elastic Stack is your answer. It is power packed with tools like Elasticsearch, Kibana, Logstash, X-Pack and Beats that takes data from any source, in any format, and provide instant search, analysis, and visualization in real time. With over 225 million downloads, it is a clear crowd favorite. Enterprises get an addon benefit in using it as a single analytical suite or getting it integrated with other products, delivering real-time actionable insights and decisions every time. Why Enterprises love the Elastic Stack? Some of the common things that enterprises love about the Elastic Stack is its being open source platform. The next thing that IT companies enjoys is its super fast distributed search mechanism that makes your queries run faster and much efficient. Apart from this, its bundling with Kibana and Logstash makes it awesome for IT infrastructure and DevOps teams who can aggregate and analyze billions of logs with ease. Its simple and robust analysis platform provides distinct advantage over Splunk, Solr, Sphinx, Ambar and many other alternative product suites. Also, its SaaS option allows customers to perform log analytics, full text search and application monitoring over the cloud with utmost ease and reasonable pricing. Companies like Amazon, Bloomberg, Ebay, SAP, Citibank, Sony, Mozilla, Wordpress, SalesForce are already been using Elastic Stack, powering their search and analytics to combat their daily business challenges. Whether it is an educational institution, travel agency, e-commerce, or a financial institution, the Elastic stack is empowering millions of companies with real-time metrics, strong analytics, better search experience and high customer satisfaction. How to install Elasticsearch in Ubuntu and Windows How to perform Numeric Metric Aggregations with Elasticsearch CRUD (Create Read, Update and Delete) Operations with Elasticsearch

0
0
7541

article-image-5-go-libraries-frameworks-and-tools-you-need-to-know

Julian Ursell

24 Jul 2014

4 min read

5 Go Libraries, Frameworks, and Tools You Need to Know

Julian Ursell

24 Jul 2014

4 min read

Golang is an exciting new language seeing rapid adoption in an increasing number of high profile domains. Its flexibility, simplicity, and performance makes it an attractive option for fields as diverse as web development, networking, cloud computing, and DevOps. Here are five great tools in the thriving ecosystem of Go libraries and frameworks. Martini Martini is a web framework that touts itself as “classy web development”, offering neat, simplified web application development. It serves static files out of the box, injects existing services in the Go ecosystem smoothly, and is tightly compatible with the HTTP package in the native Go library. Its modular structure and support for dependency injection allows developers to add and remove functionality with ease, and makes for extremely lightweight development. Out of all the web frameworks to appear in the community, Martini has made the biggest splash, and has already amassed a huge following of enthusiastic developers. Gorilla Gorilla is a toolkit for web development with Golang and offers several packages to implement all kinds of web functionality, including URL routing, optionality for cookie and filesystem sessions, and even an implementation with the WebSockets protocol, integrating it tightly with important web development standards. groupcache groupcache is a caching library developed as an alternative (or replacement) to memcached, unique to the Go language, which offers lightning fast data access. It allows developers managing data access requests to vastly improve retrieval time by designating a group of its own peers to distribute cached data. Whereas memcached is prone to producing an overload of database loads from clients, groupcache enables a successful load out of a huge queue of replicated processes to be multiplexed out to all waiting clients. Libraries such as Groupcache have a great value in the Big Data space as they contribute greatly to the capacity to deliver data in real time anywhere in the world, while minimizing potential access pitfalls associated with managing huge volumes of stored data. Doozer Doozer is another excellent tool in the sphere of system and network administration which provides a highly available data store used for the coordination of distributed servers. It performs a similar function to coordination technologies such as ZooKeeper, and allows critical data and configurations to be shared seamlessly and in real time across multiple machines in distributed systems. Doozer allows the maintenance of consistent updates about the status of a system across clusters of physical machines, creating visibility about the role each machine plays and coordinating strategies for failover situations. Technologies like Doozer emphasize how effective the Go language is for developing valuable tools which alleviate complex problems within the realm of distributed system programming and Big Data, where enterprise infrastructures are modeled around the ability to store, harness and protect mission critical information. GoLearn GoLearn is a new library that enables basic machine learning methods. It currently features several fundamental methods and algorithms, including neural networks, K-Means clustering, naïve Bayesian classification, and linear, multivariate, and logistic regressions. The library is still in development, as are the number of standard packages being written to give Go programmers the ability to develop machine learning applications in the language, such as mlgo, bayesian, probab, and neural-go. Go’s continual expansion into new technological spaces such as machine learning demonstrates how powerful the language is for a variety of different use cases and that the community of Go programmers is starting to generate the kind of development drive seen in other popular general purpose languages like Python. While libraries and packages are predominantly appearing for web development, we can see support growing for data intensive tasks and in the Big Data space. Adoption is already skyrocketing, and the next 3 years will be fascinating to observe as Golang is poised to conquer more and more key territories in the world of technology.

0
0
7382

article-image-how-can-data-scientist-get-game-development

Graham Annett

07 Aug 2017

5 min read

How can a data scientist get into game development?

Graham Annett

07 Aug 2017

5 min read

One of the most interesting uses for data science is within the aspects and process around game development. While not immediately obvious that data science can be applicable to game development, it is increasingly becoming an enticing area both from a user engagement perspective, and as a source of data collection for deep learning and data science related tasks. Games and data collection With the increase of reinforcement learning oriented deep learning tasks in the past few years, the concept of using games as a method for collection of data (somewhat in parallel to collecting data on mturk or various other crowdsourcing platforms) has never been greater. The main idea behind data collection for these types of tasks is capturing the graphical display at some time and recording the user input for that image frame. From this data, it's possible to connect these inputs into some end result (such that the final score) that can later be optimized and used as an objective cost function to be minimized or maximized. With this, it’s possible to collect a large corpus of a user's data for deep learning algorithms to initially train off of, which they can then use for the computer to play itself (something akin to this was done for AlphaGo and various other game related reinforcement learning bots). With the incredible influx of processing power now available, it’s possible for computers to play themselves thousands and millions of times to learn from themselves and their own shortcomings. Deep learning uses Practical uses of this type of deep learning that a data scientist may find interesting range from creating smart AI systems that are more engaging to a player, to finding transferable algorithms and data sources that can be used elsewhere. For example, many of the OpenAI algorithms are intended to be trained in one game with the hope that they will be transferable to another game and still do well (albeit with new parameters and learned cost function). This type of deep learning is incredibly interesting from a data scientist perspective because it is useful to not have to focus on highly optimizing each game or task that a data scientist may be working on and instead find commonalities and generalizable methodologies that translate across systems and games. Technical skills Many of the technical skills for creating pipelines of data collection from game development are much more development oriented than a traditional data scientist may be used to, and it may require learning new skills. These skills are much broader and encompassing of traditional developer roles, and initially include things such as data collection and data pipelining from the games, to scaling deep learning training and implementing new algorithms during training.These are becoming more vital to a data scientist as the need to both provide insight as well as create integrations into a product is becoming an incredibly vital skillset. Exploring projects and tools A data scientist may go about getting into this area by exploring such projects and tools such as OpenAI’s gym and Facebook's MazeBase. These projects are very deep learning oriented though, and may not be what a traditional data scientist thinks of when they are interested in game development. Data oriented/driven game design Another approach is data oriented/driven game design. While this is not a new concept by any means, it has become increasingly ubiquitous as in-app purchasing and subscription based gaming plans have become a common theme among mobile and other gaming platforms. These types of data science tasks are not unlike normal data science related projects, in that they seek to understand from a statistical perspective what is happening to users at specific points along the games. There is a pretty big overlap in projects like this and projects that aim to understand, for instance, when a user abandons a cart during an online order. The data for the games may be oriented around when the gamer gave up on a quest, or at what point users are willing to make an in-app purchase to quicker achieve a goal. Since these are quantifiable and objective goals, they are an incredibly fit for traditional supervised learning tasks and can be approached with traditional supervised learning baselines and algorithms. The end result of these tasks may include things such as making a quest or goal easier, or making an in-app purchase cheaper during some specific interval that the user would be more inclined to purchase (much like offering a user a coupon if a cart is abandoned during checkout often entices the user to come back and finish the purchase). While both of these paths are game development oriented, they differ quite a lot in that one is much more traditionally data analytical, and one is much more deep learning engineering oriented. They both are highly interesting areas to explore from a professional standpoint, but data driven game development may be somewhat limited from a hobbyist standpoint outside of Kaggle competitions (which a quick search didn’t seem to show any previous competitions having this sort of data) since many companies would be quite hesitant to provide this sort of data if their entire business model is based around in-app purchases and recurring revenue from players. Overall, these are both incredibly enticing areas and are great avenues to pursue and provide plenty of interesting problems that you may not encounter outside of game development. About the Author Graham Annett is an NLP Engineer at Kip (Kipthis.com). He has been interested in deep learning for a bit over a year and has worked with and contributed to Keras (https://github.com/fchollet/keras). He can be found on Github at http://github.com/grahamannett or via http://grahamannett.me.

0
0
7345

Tech Guides

article-image-why-triple-game-development-unsustainable

Raka Mahesa

12 Jun 2017

5 min read

Why is triple-A game development unsustainable?

Raka Mahesa

12 Jun 2017

5 min read

The video game industry is a huge corporation that has brought in over $91 billion in revenue during 2016 alone. Not only big, it's also a growing industry with a projected yearly growth rate of 3.6%. So it's quite surprising when Cliff Bleszinski, a prominent figure in the game industry, made a remark that the business of modern triple-A games is unsustainable. While the statement may sound "click-bait-y", he's not the only person from the industry to voice concern about the business model. Back in 2012, a game director from Ubisoft; one of the biggest game publishers in the world, made a similar remark about how the development of triple-A games could be harmful. Seeing how there is another person voicing a similar concern, maybe there's some truth to what they are saying. And if it is true, what makes triple-A game development unsustainable? Let's take a look. So, before we go further, let's first clear up one thing: what are triple-A games? Triple-A games (or AAA games) are a tier of video games with the highest development budget. It's not a formal classification, so there isn't an exact budget limit that must be passed for a game to be categorized as triple-A. Additionally, even though this classification makes it seems like triple-A games are these super premium games of the highest quality; in reality, most games you find in a video game store are triple-A games, being sold at $60. So that's the triple-A tier, but what other tiers of video games are there, and where are they sold? Well, there are indie games and double-A (AA) games. Indie games are made by a small team with a small budget and are sold at a price of $20 and lower. The double-A games are made with bigger budgets than indie games and sold at a higher price of $40. Both tiers of video games are sold digitally at a digital storefront like Steam and usually are not sold on a physical media like DVD. Do keep in mind that this classification is for PC or console video games and isn't really applicable to mobile games. Also, it is important to note that this classification of video games doesn't determine which game has the better quality or which one has the better sales. After all, Minecraft is an indie game with a really small initial development team that has sold over 100 million copies. In comparison, Grand Theft Auto V, a triple-A game with a $250 million development budget, has "only" sold 75 million copies. And yes, you read that right. Grand Theft Auto V has a development cost of $250 million, with half of that cost being marketing. Most triple-A games don't have as much development budget, but they're still pretty expensive. Call of Duty: Modern Warfare 2 has a development cost of $200 million, The Witcher 3 has a development cost of $80 million, and the production cost (which means marketing cost is excluded) of Final Fantasy XIII is $65 million. So, with that kind of budget, how do those games fare? Well, fortunately for Grand Theft Auto V, it made $1 billion in sales in just three days after it was released, making it the fastest-selling entertainment product of all time. Final Fantasy XIII has a different story though. Unlike Grand Theft Auto V with its 75 million sales number, the lifetime sales number of Final Fantasy XIII is only 6.6 million, which means it made roughly $350 million in sales, not much when compared to its production cost of $65 million. And this is why triple-A game development is unsustainable. The development cost of those games is getting so high that the only way for the developer to gain profitability is to sell millions and millions of copies of those games. Meanwhile, each day there are more video games being released, making it harder for each game to gain sales. Grand Theft Auto V is the exception and not the rule here, since there aren't a lot of video games that can even reach 10 million in sales. With that kind budget, the development of every triple-A game has become very risky. After all, if a game doesn't sell well, the developer could lose tens of millions of dollars, enough to bankrupt a small company that doesn't have much funding. And even for a big company with plenty of funding, how many projects could they fail on before they're forced to shut down? And with risky projects comes risk mitigation. With so much money at stake, developers are forced to play safe and only work on games with mainstream appeal. Oh, the science fiction theme doesn’t have the audience as big as a military theme? Let’s only make games with a military theme then. But if all game developers think the same way, the video game market could end up with only a handful of genres, with all those developers competing for the same audience. It’s a vicious cycle, really. High budget games need to have a high amount of sales to recoup its production cost. But for a game to get a high amount of sales, it needs to have high development budgets to compete with other games on the market. So, if triple-A game development is truly unsustainable, would that mean those high budget games will disappear from the market in the future? Well, it's possible. But as we've seen with Minecraft, you don't need hundreds of millions in development budget to create a good game that will sell well. So even though the number of video games with high budgets may diminish, high quality video games will still exist. About the Author RakaMahesa is a game developer at Chocoarts: http://chocoarts.com/, who is interested in digital technology in general. Outside of work hours, he likes to work on his own projects, with Corridoom VR being his latest released game. Raka also regularly tweets as @legacy99.

0
0
7334

Tech Guides

article-image-how-to-handle-aws-through-the-command-line

Felix Rabe

04 Jan 2016

8 min read

How to Handle AWS Through the Command Line

Felix Rabe

04 Jan 2016

8 min read

Are you a developer running OS X or Linux, and would like to give Amazon AWS a shot? And do you prefer the command line over fancy GUIs? Read on and you might have your first AWS-provided, Ubuntu-powered Nginx web server running in 30 to 45 minutes. This article will guide you from just having OS X or Linux installed on your computer to a running virtual server on Amazon Web Services (AWS), completely controlled via its Command Line Interface (CLI). If you just start out using Amazon AWS, you'll be able to benefit from the AWS Free Tier for 12 months. This tutorial only uses resources available via the AWS free tier. (Disclaimer: I am not affiliated with Amazon in any other way than as a user.) Required skills: A basic knowledge of the command line (shell) and web technologies (SSH, HTTP) is all that's needed. Open your operating system's terminal to follow along. What are Amazon Web Services (AWS)? Amazon AWS is a collection of services based on Amazon's infrastructure that Amazon provides to the general public. These services include computing resources, file storage, databases, and even crowd-sourced manual labor. See http://aws.amazon.com/products/ for an overview of the provided services. What is the Amazon AWS Command Line Interface (CLI)? The AWS CLI tool enables to control all operational aspects of AWS from the command line. This is a great advantage for automating processes and for people (like me) with a preference for textual user interfaces. How to create an AWS account Head over to https://aws.amazon.com/ and sign up for an account. This process will require you to have a credit card and a mobile phone at hand. Then come back and read on. How to generate access keys Before the AWS CLI can be configured for use, you need to create a user with the required permissions and download his access keys (AWS Access Key ID and AWS Secret Access Key) for use in the AWS CLI. In the AWS Console (https://console.aws.amazon.com/), open your account menu in the upper right corner and click on "Security Credentials": Security Credentials If a dialog pops up for you, just dismiss it for now by clicking "Continue to Security Credentials". Then, in the sidebar on the left, click on "Groups": Groups Create a group (e.g. "Developers") and attach the policy "AmazonEC2FullAccess" to it. Then, in the sidebar on the left, click on "Users": Users Create a new user, and then copy or download the security credentials to a safe place. You will need them soon. Click on the new user, then "Add User to Groups" to add the user to the group you've just created before. This gives the user (and the keys) the required capabilities to manipulate EC2 Install AWS CLI via Homebrew (OS X) (Linux users can skip to the next section.) On OS X, Homebrew provides a simple way to install other software from the command line and is widely used. Even though the AWS CLI documentation recommends installation via pip (the Python package manager), I chose to install AWS CLI via Homebrew as it is more common. AWS CLI on Homebrew might lag behind a version compared to pip, though. Open the Terminal application and install Homebrew by running: ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" The installation script will guide you through the necessary steps to get Homebrew set up. Once finished, install AWS CLI using: brew install awscli aws --version If the last command successfully shows you the version of the AWS CLI, you can continue on to the section about configuring AWS CLI. Install AWS CLI via pip (Linux) On Debian or Ubuntu Linux, run: sudo apt-get install python-pip sudo pip install awscli aws --version On Fedora, run: sudo yum install python-pip sudo pip install awscli aws --version If the last command successfully shows you the version of the AWS CLI, you can continue on with the next section. Configure AWS CLI and Run Your First Virtual Server Run aws configure and paste in the credentials you've received earlier: $ aws configure AWS Access Key ID [None]: AKIAJXIXMECPZBXKMK7A AWS Secret Access Key [None]: XUEZaXQ32K+awu3W+I/qPyf6+PIbFFORNM4/3Wdd Default region name [None]: us-west-1 Default output format [None]: json Here you paste the credentials you've copied or downloaded above, for "AWS Access Key ID" and "AWS Secret Access Key". (Don't bother trying the values given in the example, as I've already changed the keys.) We'll use the region "us-west-1" here. If you want to use another one, you will have to find an equivalent AMI (HD image, "Ubuntu Server 14.04 LTS (HVM), SSD Volume Type", ID ami-df6a8b9b in region "us-west-1") with a different ID for your region. The output formats available are "json", "table" and "text", and can be changed for each individual AWS CLI command by appending the --output <format> option. "json" is the default and produces pretty-printed (though not key-sorted) JSON output. "table" produces a human-readable presentation. "text" is a tab-delimited format that is easy to parse in shell scripts. Help on AWS CLI The AWS CLI is well documented on http://aws.amazon.com/documentation/cli/, and man pages for all commands are available by appending help to the end of the command line: aws help aws ec2 help aws ec2 run-instances help Amazon EC2 Amazon EC2 is the central piece of AWS. The EC2 website says: "Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable compute capacity in the cloud." In other words, EC2 provides the capability to run virtual machines connected to the Internet. That's what will make our Nginx run, so let's make use of it. To do that, we need to enable SSH networking and generate an SSH key for logging in. Setting Up the Security Group (Firewall) Security Groups are virtual firewalls. To make a virtual machine accessible, it is associated with one (or more) security group. A security group defines which ports are open and to what IP ranges. Without further ado: aws ec2 create-security-group --group-name tutorial-sg --description "Tutorial security group" aws ec2 authorize-security-group-ingress --group-name tutorial-sg --protocol tcp --port 22 --cidr 0.0.0.0/0 aws ec2 authorize-security-group-ingress --group-name tutorial-sg --protocol tcp --port 80 --cidr 0.0.0.0/0 This creates a security group called "tutorial-sg" with ports 22 and 80 open to the world. To confirm that you have set it up correctly, you can then run: aws ec2 describe-security-groups --group-name tutorial-sg --query 'SecurityGroups[0].{name:GroupName,description:Description,ports:IpPermissions[*].{from:FromPort,to:ToPort,cidr:IpRanges[0].CidrIp,protocol:IpProtocol}}' The --query option is a great way to filter through AWS CLI JSON output. You can safely remove the --query option from the aws ec2 describe-security-groups command to see the JSON output in full. The AWS CLI documentation has more information about AWS CLI output manipulation and the --query option. Generate an SSH Key To actually log in via SSH, we need an SSH key: aws ec2 create-key-pair --key-name tutorial-key --query 'KeyMaterial' --output text > tutorial-key.pem chmod 0400 tutorial-key.pem Run Your First Instance (Virtual Machine) on EC2 Finally, we can run our first instance on AWS! Remember that the image ID "ami-df6a8b9b" is specific to the region "us-west-1". In case you wonder about the size of the disk, this command will also create a new 8 GB disk volume based on the size of the specified disk image: instance=$(aws ec2 run-instances --image-id ami-df6a8b9b --count 1 --instance-type t2.micro --security-groups tutorial-sg --key-name tutorial-key --query 'Instances[0].InstanceId' --output text) ; echo Instance: $instance This shows you the IP address and the state of the new instance in a nice table: aws ec2 describe-instances --instance-ids $instance --query 'Reservations[*].Instances[*].[InstanceId,PublicIpAddress,State.Name]' --output table Install Nginx and Open Your Shiny New Website And now we can log in to the new instance to install Nginx: ipaddr=$(aws ec2 describe-instances --instance-ids $instance --query 'Reservations[0].Instances[0].PublicIpAddress' --output text) ; echo IP: $ipaddr ssh -i tutorial-key.pem ubuntu@$ipaddr sudo apt-get update && ssh -i tutorial-key.pem ubuntu@$ipaddr sudo apt-get install -y nginx If you now open the website at $ipaddr in your browser (OS X: `open http://$ipaddr, Ubuntu:xdg-open http://$ipaddr`), you should be greeted with the "Welcome to nginx!" message. Success! :) Cleaning Up In case you might want to stop your instance again: aws ec2 stop-instances --instance-ids $instance To start the instance again, substitute start for stop (caution: the IP address will probably change), and to completely remove the instance (including the volume), substitute terminate for stop. Resources Amazon Web Services: http://aws.amazon.com/ AWS CLI documentation: http://aws.amazon.com/documentation/cli/ AWS CLI EC2 reference: http://docs.aws.amazon.com/cli/latest/reference/ec2/index.html Official AWS Tutorials: http://docs.aws.amazon.com/gettingstarted/latest/awsgsg-intro/gsg-aws-tutorials.html AWS Command Line Interface: http://aws.amazon.com/cli/ Homebrew: http://brew.sh/ About the author Felix Rabe is a developer who develops in Go and deploys in Docker. He can be found on Github and Twitter @felixrabe.

0
0
7264

Tech Guides

article-image-level-your-companys-big-data-resource-management

Timothy Chen

24 Dec 2015

4 min read

Level Up Your Company's Big Data With Resource Management

Timothy Chen

24 Dec 2015

4 min read

Big data was once one of the biggest technology hypes, where tons of presentations and posts talked about how the new systems and tools allows large and complex data to be processed that traditional tools wasn't able to. While Big data was at the peak of its hype, most companies were still getting familiar with the new data processing frameworks such as Hadoop, and new databases such as HBase and Cassandra. Fast foward to now where Big data is still a popular topic, and lots of companies has already jumped into the Big data bandwagon and are already moving past the first generation Hadoop to evaluate newer tools such as Spark and newer databases such as Firebase, NuoDB or Memsql. But most companies also learn from running all of these tools, that deploying, operating and planning capacity for these tools is very hard and complicated. Although over time lots of these tools have become more mature, they are still usually running in their own independent clusters. It's also not rare to find multiple clusters of Hadoop in the same company since multi-tenant isn't built in to many of these tools, and you run the risk of overloading the cluster by a few non-critical big data jobs. Problems running indepdent Big data clusters There are a lot of problems when you run a lot of these independent clusters. One of them is monitoring and visibility, where all of these clusters have their own management tools and to integrate the company's shared monitoring and management tools is a huge challenge especially when onboarding yet another framework with another cluster. Another problem is multi-tenancy. Although having independent clusters solves the problem, another org's job can overtake the whole cluster. It still doesn't solve the problem when a bug in the Hadoop application just uses all the available resources and the pain of debugging this is horrific. A another problem is utilization, where a cluster is usually not 100% being utilized and all of these instances running in Amazon or in your datacenter are just racking up bills for doing no work. There are more major pain points that I don't have time to get into. Hadoop v2 The Hadoop developers and operators saw this problem, and in the 2nd generation of Hadoop they developed a separate resource management tool called YARN to have a single management framework that manages all of the resources in the cluster from Hadoop, enforce the resource limitations of the jobs, integrate security in the workload, and even optimize the workload by placing jobs closer to the data automatically. This solves a huge problem when operating a Hadoop cluster, and also consolidates all of the Hadoop clusters into one cluster since it allows a finer grain control over the workload and saves effiency of the cluster. Beyond Hadoop Now with the vast amount of Big data technologies that are growing in the ecosystem, there is a need to integrate a common resource management layer among all of the tools since without a single resource management system across all the frameworks we run back into the same problems as we mentioned before. Also when all these frameworks are running under one resource management platform, a lot of options for optimizations and resource scheduling are now possible. Here are some examples what could be possible with one resource management platform: With one resource management platform the platform can understand all of the cluster workload and available resources and can auto resize and scale up and down based on worklaods across all these tools. It can also resize jobs according to priority. The cluster is able to detect under utilization from other jobs and offer the slack resources to Spark batch jobs while not impacting your very important workloads from other frameworks, and maintain the same business deadlines and save a lot more cost. In the next post I'll continue to cover Mesos, which is one such resource management system and how the upcoming features in Mesos allows optimizations I mentioned to be possible. For more Big Data tutorials and analysis, visit our dedicated Hadoop and Spark pages. About the author Timothy Chen is a distributed systems engineer and entrepreneur. He works at Mesosphere and can be found on Github @tnachen.

0
0
7224

Top 5 misconceptions about data science

Stitch Fix: Full Stack Data Science and other winning strategies

5 data science tools that will matter in 2018

Fab Lab: worldwide labs for digital fabrication

Application Flow With Generators

Paper in Two minutes: i-RevNet, a deep invertible convolutional network

What Did Big Data Deliver In 2014?

Is OpenStack a ‘Jack of All Trades, Master of None’ Cloud Solution?

What’s next after Angular? Take the Meteor challenge!

Why Enterprises love the Elastic Stack

Trending Topics

5 Go Libraries, Frameworks, and Tools You Need to Know

How can a data scientist get into game development?

Why is triple-A game development unsustainable?

How to Handle AWS Through the Command Line

Level Up Your Company's Big Data With Resource Management

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access