Data | Tech News, Tutorials & Expert Insights

06 Dec 2017

13 min read

PyTorch 0.3.0 releases, ending stochastic functions

06 Dec 2017

PyTorch 0.3.0 has removed stochastic functions, i.e. Variable.reinforce(), citing “limited functionality and broad performance implications.” The Python package has added a number of performance improvements, new layers, support to ONNX, CUDA 9, cuDNN 7, and “lots of bug fixes” in the new version. “The motivation for stochastic functions was to avoid book-keeping of sampled values. In practice, users were still book-keeping in their code for various reasons. We constructed an alternative, equally effective API, but did not have a reasonable deprecation path to the new API. Hence this removal is a breaking change,” PyTorch team said. To replace stochastic functions, they have introduced the torch.distributions package. So if your previous code looked like this: probs = policy_network(state) action = probs.multinomial() next_state, reward = env.step(action) action.reinforce(reward) action.backward() This could be the new equivalent code: probs = policy_network(state) # NOTE: categorical is equivalent to what used to be called multinomial m = torch.distributions.Categorical(probs) action = m.sample() next_state, reward = env.step(action) loss = -m.log_prob(action) * reward loss.backward() What is new in PyTorch 0.3.0? Unreduced losses Now, Some loss functions can compute per-sample losses in a mini-batch By default PyTorch sums losses over the mini-batch and returns a single scalar loss. This was limiting to users. Now, a subset of loss functions allow specifying reduce=False to return individual losses for each sample in the mini-batch Example: loss = nn.CrossEntropyLoss(..., reduce=False) Currently supported losses: MSELoss, NLLLoss, NLLLoss2d, KLDivLoss, CrossEntropyLoss, SmoothL1Loss, L1Loss More loss functions will be covered in the next release An in-built Profiler in the autograd engine PyTorch has built a low-level profiler to help you identify bottlenecks in your models. Let us start with an example: >>> x = Variable(torch.randn(1, 1), requires_grad=True) >>> with torch.autograd.profiler.profile() as prof: ... y = x ** 2 ... y.backward() >>> # NOTE: some columns were removed for brevity ... print(prof) -------------------------------- ---------- --------- Name CPU time CUDA time ------------------------------- ---------- --------- PowConstant 142.036us 0.000us N5torch8autograd9GraphRootE 63.524us 0.000us PowConstantBackward 184.228us 0.000us MulConstant 50.288us 0.000us PowConstant 28.439us 0.000us Mul 20.154us 0.000us N5torch8autograd14AccumulateGradE 13.790us 0.000us N5torch8autograd5CloneE 4.088us 0.000us The profiler works for both CPU and CUDA models. For CUDA models, you have to run your python program with a special nvprof prefix. For example: nvprof --profile-from-start off -o trace_name.prof -- python <your arguments> # in python >>> with torch.cuda.profiler.profile(): ... model(x) # Warmup CUDA memory allocator and profiler ... with torch.autograd.profiler.emit_nvtx(): ... model(x) Then, you can load trace_name.prof in PyTorch and print a summary profile report. >>> prof = torch.autograd.profiler.load_nvprof('trace_name.prof') >>> print(prof) For additional documentation, you can visit this link. Higher order gradients v0.3.0 has added higher-order gradients support for the following layers: ConvTranspose, AvgPool1d, AvgPool2d, LPPool2d, AvgPool3d, MaxPool1d, MaxPool2d, AdaptiveMaxPool, AdaptiveAvgPool, FractionalMaxPool2d, MaxUnpool1d, MaxUnpool2d, nn.Upsample, ReplicationPad2d, ReplicationPad3d, ReflectionPad2d PReLU, HardTanh, L1Loss, SoftSign, ELU, RReLU, Hardshrink, Softplus, SoftShrink, LogSigmoid, Softmin, GLU MSELoss, SmoothL1Loss, KLDivLoss, HingeEmbeddingLoss, SoftMarginLoss, MarginRankingLoss, CrossEntropyLoss DataParallel Optimizers optim.SparseAdam: Implements a lazy version of Adam algorithm suitable for sparse tensors. (In this variant, only moments that show up in the gradient get updated, and only those portions of the gradient get applied to the parameters.) Optimizers now have an add_param_group function that lets you add new parameter groups to an already constructed optimizer. New layers and nn functionality Added AdpativeMaxPool3d and AdaptiveAvgPool3d Added LPPool1d F.pad now has support for: 'reflection' and 'replication' padding on 1d, 2d, 3d signals (so 3D, 4D and 5D Tensors) constant padding on n-d signals nn.Upsample now works for 1D signals (i.e. B x C x L Tensors) in nearest and linear modes. Allow user to not specify certain input dimensions for AdaptivePool*d and infer them at runtime. For example: # target output size of 10x7 m = nn.AdaptiveMaxPool2d((None, 7)) DataParallel container on CPU is now a no-op (instead of erroring out) New Tensor functions and features Introduced torch.erf and torch.erfinv that compute the error function and the inverse error function of each element in the Tensor. Adds broadcasting support to bitwise operators Added Tensor.put_ and torch.take similar to numpy.take and numpy.put. The take function allows you to linearly index into a tensor without viewing it as a 1D tensor first. The output has the same shape as the indices. The put function copies value into a tensor also using linear indices. Adds zeros and zeros_like for sparse Tensors. 1-element Tensors can now be casted to Python scalars. For example: int(torch.Tensor([5]))works now. Other additions Added torch.cuda.get_device_name and torch.cuda.get_device_capability that do what the names say. Example: >>> torch.cuda.get_device_name(0) 'Quadro GP100' >>> torch.cuda.get_device_capability(0) (6, 0) If one sets torch.backends.cudnn.deterministic = True, then the CuDNN convolutions use deterministic algorithms torch.cuda_get_rng_state_all and torch.cuda_set_rng_state_all are introduced to let you save / load the state of the random number generator over all GPUs at once torch.cuda.emptyCache() frees the cached memory blocks in PyTorch's caching allocator. This is useful when having long-running ipython notebooks while sharing the GPU with other processes. API changes softmax and log_softmax now take a dim argument that specifies the dimension in which slices are taken for the softmax operation. dim allows negative dimensions as well (dim = -1 will be the last dimension) torch.potrf (Cholesky decomposition) is now differentiable and defined on Variable Remove all instances of device_id and replace it with device, to make things consistent torch.autograd.grad now allows you to specify inputs that are unused in the autograd graph if you use allow_unused=True This gets useful when using torch.autograd.grad in large graphs with lists of inputs / outputs For example: x, y = Variable(...), Variable(...) torch.autograd.grad(x * 2, [x, y]) # errors torch.autograd.grad(x * 2, [x, y], allow_unused=True) # works pad_packed_sequence now allows a padding_value argument that can be used instead of zero-padding Dataset now has a + operator (which uses ConcatDataset). You can do something like MNIST(...) + FashionMNIST(...) for example, and you will get a concatenated dataset containing samples from both. torch.distributed.recv allows Tensors to be received from any sender (hence, src is optional). recv returns the rank of the sender. adds zero_() to Variable Variable.shape returns the size of the Tensor (now made consistent with Tensor) torch.version.cuda specifies the CUDA version that PyTorch was compiled with Added a missing function random_ for CUDA. torch.load and torch.save can now take a pathlib.Path object, which is a standard Python3 typed filepath object If you want to load a model's state_dict into another model (for example to fine-tune a pre-trained network), load_state_dict was strict on matching the key names of the parameters. Now Pytorch provides a strict=False option to load_state_dict where it only loads in parameters where the keys match, and ignores the other parameter keys. added nn.functional.embedding_bag that is equivalent to nn.EmbeddingBag Performance Improvements The overhead of torch functions on Variables was around 10 microseconds. This has been brought down to ~1.5 microseconds by moving most of the core autograd formulas into C++ using ATen library. softmax and log_softmax are now 4x to 256x faster on the GPU after rewriting the gpu kernels 2.5x to 3x performance improvement of the distributed AllReduce (gloo backend) by enabling GPUDirect nn.Embedding's renorm option is much faster on the GPU. For embedding dimensions of 100k x 128 and a batch size of 1024, it is 33x faster. All pointwise ops now use OpenMP and get multi-core CPU benefits Added a single-argument version of torch.arange. For example torch.arange(10) Framework Interoperability DLPack Interoperability DLPack Tensors are cross-framework Tensor formats. We now have torch.utils.to_dlpack(x) and torch.utils.from_dlpack(x) to convert between DLPack and torch Tensor formats. The conversion has zero memory copy and hence is very efficient. Model exporter to ONNX ONNX is a common model interchange format that can be executed in Caffe2, CoreML, CNTK, MXNet, and Tensorflow at the moment. PyTorch models that are ConvNet-like and RNN-like (static graphs) can now be shipped to the ONNX format. There is a new module torch.onnx (http://pytorch.org/docs/0.3.0/onnx.html) which provides the API for exporting ONNX models. The operations supported in this release are: add, sub (nonzero alpha not supported), mul, div, cat, mm, addmm, neg, tanh, sigmoid, mean, t, transpose, view, split, squeeze expand (only when used before a broadcasting ONNX operator; e.g., add) prelu (single weight shared among input channels not supported) threshold (non-zero threshold/non-zero value not supported) Conv, ConvTranspose, BatchNorm, MaxPool, RNN, Dropout, ConstantPadNd, Negate elu, leaky_relu, glu, softmax, log_softmax, avg_pool2d unfold (experimental support with ATen-Caffe2 integration) Embedding (no optional arguments supported) RNN FeatureDropout (training mode not supported) Index (constant integer and tuple indices supported) Usability Improvements More cogent error messages during indexing of Tensors / Variables Breaking changes Add proper error message for specifying dimension on a tensor with no dimensions better error messages for Conv*d input shape checking More user-friendly error messages for LongTensor indexing Better error messages and argument checking for Conv*d routines Trying to construct a Tensor from a Variable fails more appropriately If you are using a PyTorch binary with insufficient CUDA version, then a warning is printed to the user. Fixed incoherent error messages in load_state_dict Fix error message for type mismatches with sparse tensors Bug fixes torch Fix CUDA lazy initialization to not trigger on calls to torch.manual_seed (instead, the calls are queued and run when CUDA is initialized) Tensor if x is 2D, x[[0, 3],] was needed to trigger advanced indexing. The trailing comma is no longer needed, and you can do x[[0, 3]] x.sort(descending=True) used to incorrectly fail for Tensors. Fixed a bug in the argument checking logic to allow this. Tensor constructors with numpy input: torch.DoubleTensor(np.array([0,1,2], dtype=np.float32)) torch will now copy the contents of the array in a storage of appropriate type. If types match, it will share the underlying array (no-copy), with equivalent semantics to initializing a tensor with another tensor. On CUDA, torch.cuda.FloatTensor(np.random.rand(10,2).astype(np.float32)) will now work by making a copy. ones_like and zeros_like now create Tensors on the same device as the original Tensor expand and expand_as allow expanding an empty Tensor to another empty Tensor torch.HalfTensor supports numpy() and torch.from_numpy Added additional size checking for torch.scatter Fixed random_ on CPU (which previously had a max value of 2^32) for DoubleTensor and LongTensor Fix ZeroDivisionError: float division by zero when printing certain Tensors torch.gels when m > n had a truncation bug on the CPU and returned incorrect results. Fixed. Added a check in tensor.numpy() that checks if no positional arguments are passed Before a Tensor is moved to CUDA pinned memory, added a check to ensure that it is contiguous Fix symeig on CUDA for large matrices. The bug is that not enough space was being allocated for the workspace, causing some undefined behavior. Improved the numerical stability of torch.var and torch.std by using Welford's algorithm The Random Number Generator returned uniform samples with inconsistent bounds (inconsistency in cpu implementation and running into a cublas bug). Now, all uniform sampled numbers will return within the bounds [0, 1), across all types and devices Fixed torch.svd to not segfault on large CUDA Tensors (fixed an overflow error in the magma bindings) Allows empty index Tensor for index_select (instead of erroring out) Previously when eigenvector=False, symeig returned some unknown value for the eigenvectors. Now this is corrected. sparse Fix bug with 'coalesced' calculation in sparse 'cadd' Fixes .type() not converting indices tensor. Fixes sparse tensor coalesce on the GPU in corner cases autograd Fixed crashes when calling backwards on leaf variable with requires_grad=False fix bug on Variable type() around non-default GPU input. when torch.norm returned 0.0, the gradient was NaN. We now use the subgradient at 0.0, so the gradient is 0.0. Fix an correctness issue with advanced indexing and higher-order gradients torch.prod's backward was failing on the GPU due to a type error, fixed. Advanced Indexing on Variables now allows the index to be a LongTensor backed Variable Variable.cuda() and Tensor.cuda() are consistent in kwargs options optim torch.optim.lr_scheduler is now imported by default. nn Returning a dictionary from a nn.Module's forward function is now supported (used to throw an error) When register_buffer("foo", ...) is called, and self.foo already exists, then instead of silently failing, now raises a KeyError Fixed loading of older checkpoints of RNN/LSTM which were missing _data_ptrs attributes. nn.Embedding had a hard error when using the max_norm option. This is fixed now. when using the max_norm option, the passed-in indices are written upon (by the underlying implementation). To fix this, pass a clone of the indices to the renorm kernel. F.affine_grid now can take non-contiguous inputs EmbeddingBag can accept both 1D and 2D inputs now. Workaround a CuDNN bug where batch sizes greater than 131070 fail in CuDNN BatchNorm fix nn.init.orthogonal to correctly return orthonormal vectors when rows < cols if BatchNorm has only 1 value per channel in total, raise an error in training mode. Make cuDNN bindings respect the current cuda stream (previously raised incoherent error) fix grid_sample backward when gradOutput is a zero-strided Tensor Fix a segmentation fault when reflection padding is out of Tensor bounds. If LogSoftmax has only 1 element, -inf was returned. Now this correctly returns 0.0 Fix pack_padded_sequence to accept inputs of arbitrary sizes (not just 3D inputs) Fixed ELU higher order gradients when applied in-place Prevent numerical issues with poisson_nll_loss when log_input=False by adding a small epsilon distributed and multi-gpu Allow kwargs-only inputs to DataParallel. This used to fail: n = nn.DataParallel(Net()); out = n(input=i) DistributedDataParallel calculates num_samples correctly in python2 Fix the case of DistributedDataParallel when 1-GPU per process is used. Allow some params to be requires_grad=False in DistributedDataParallel Fixed DataParallel to specify GPUs that don't include GPU-0 DistributedDataParallel's exit doesn't error out anymore, the daemon flag is set. Fix a bug in DistributedDataParallel in the case when model has no buffers (previously raised incoherent error) Fix __get_state__ to be functional in DistributedDataParallel (was returning nothing) Fix a deadlock in the NCCL bindings when GIL and CudaFreeMutex were starving each other Among other fixes,model.zoo.load_url now first attempts to use the requests library if available, and then falls back to urllib. To download the source code, click here.

0
0
20284

article-image-trending-datascience-news-5th-dec-17-headlines

Packt Editorial Staff

05 Dec 2017

4 min read

5th Dec.' 17 - Headlines

Packt Editorial Staff

05 Dec 2017

4 min read

Google's DeepVariant, IBM's DuHL machine learning algorithm, Desktop compatibility of Nvidia GPU Cloud, Google AutoML's "child" AI NASNet, and a new tool for FPGA programming in today's trending stories in data science news. Google's DeepVariant to make sense out of your genome Google releases DeepVariant, a deep learning tool to decipher Genome Sequencing data Google has announced DeepVariant, a new deep neural network to call genetic variants from next-generation DNA sequencing data. Released as an open source software, DeepVariant uses the latest deep learning techniques to build a more accurate picture of a person’s genome from sequencing data. It is available on the GitHub here. Nvidia democratizes AI development Nvidia GPU Cloud to now support everyday desktops In what could make developing artificial intelligence models easier to hundreds of thousands of researchers worldwide, Nvidia has updated its GPU Cloud to support everyday desktops. In addition to the new desktop compatibility, the chip maker has added support for two new deep learning frameworks. The first is the PaddlePaddle engine that Chinese search giant Baidu released last year, which allows developers to implement certain models with a lot less code than some alternatives. The other is the 1.0 release of MXNet, the AI framework backed by Amazon’s cloud division. IBM's new algorithm for machine learning IBM claims 10x faster machine learning with its new DuHL algorithm In coordination with EPFL researchers, IBM has created a new method for working with large data sets to train machine learning algorithms. The new algorithm, called Duality-gap based Heterogeneous Learning (DuHL), is capable of pushing through 30GB of data every 60 seconds, resulting in a 10x improvement over previous methods. During preliminary testing, IBM used an Nvidia Quadro M4000 with 8GB of GDDR5 memory. With a modestly priced professional graphics card, IBM demonstrated that it could train Support Vector Machines over 10 times faster using its DuHL system compared to a standard sequential operating approach. New tools for FPGA programming New product from Falcon Computing lets software programmers design FPGA accelerators without any knowledge of FPGA New startup Falcon Computing Solutions Inc. has developed automated compilation tools that focus on streamlining FPGA-based acceleration. Its principal product is Merlin, a compiler that provides push-button C/C++ language programming to optimize FPGA implementation and work in a fully integrated fashion with Intel’s own development tools. “It’s a pure C/C++ flow that enables software programmers to design FPGA accelerators without any knowledge of FPGA,” said Jim Wu, director of consumer experience at Falcon Computing. “We want to put the tool in the hands of all software programmers.” The company is making the product available in a 14-day trial for use in the enterprise data center or in the cloud, and general availability is planned for the first quarter of 2018. Falcon Computing already has agreements with Amazon Web Services Inc. and Alibaba Cloud, and is working to bring the tool to other public cloud providers as well. AutoML gives birth to NASNet Google AutoML’s “child” NASNet delivers advanced machine vision results Google's AutoML project, designed to make AI build other AIs, has now developed a computer vision system that vastly outperforms state-of-the-art-models. NASNet, the new project, could improve how autonomous vehicles and next-generation AI robots ‘see.’ Being dubbed as AutoML’s “child” AI, NASNet recognizes objects — people, cars, traffic lights, handbags, backpacks, etc. — in a video in real-time. Google researchers acknowledge that NASNet could prove useful for a wide range of applications and have open-sourced the AI for inference on image classification and object detection. “We hope that the larger machine learning community will be able to build on these models to address multitudes of computer vision problems we have not yet imagined,” Google said in a blog post.

0
0
1832

article-image-trending-datascience-news-4th-dec-17-headlines

Packt Editorial Staff

04 Dec 2017

3 min read

4th Dec.' 17 - Headlines

Packt Editorial Staff

04 Dec 2017

3 min read

Amazon’s new Deep Learning AMI, Mapbox’s acquisition of Fitness AR, and a new Home.me tool in today’s data science news. Amazon announces new AWS Deep Learning AMI for Microsoft Windows Amazon Web Services is now offering AWS Deep Learning AMI for Microsoft Windows Server 2012 R2 and 2016. The new AMIs (Amazon Machine Images) contain all the necessary pre-built packages, libraries, and frameworks needed to start building AI systems using deep learning on Microsoft Windows. They also include popular deep learning frameworks such as Apache MXNet, Caffe and Tensorflow, as well as packages that enable easy integration with AWS, including launch configuration tools and many popular AWS libraries and tools. The AMIs come prepackaged with Nvidia CUDA 9, cuDNN 7, and Nvidia 385.54 drivers, and contain the Anaconda platform (supports Python versions 2.7 and 3.5). Amazon Web Services said the AWS Deep Learning AMIs for Microsoft Windows are provided at no additional cost beyond the Amazon EC2 instance hours used, and are available in all public regions. The AMIs can be used directly through AWS EC2 Console or AWS Marketplace. Users can visit the EC2 Windows user guide for step-by-step instructions on launching an EC2 instance, and get more resources for Windows in the documentation. Mapbox acquires augmented reality activity tracking app Fitness AR Open-source mapping platform Mapbox has acquired activity tracking app Fitness AR, which allows users to view runs, hikes or cycling routes from Strava superimposed on a 3D map of the terrain. The announcement was made by Mapbox VP Paul Veugen in a Medium post. Mapbox will continue to deliver updates to the app, which will be dropping its $2.99 price and going free in the App Store starting today. Fitness AR was among the first ARKit-enabled apps and was featured by Apple early on. The app utilized Mapbox’s Unity Maps SDK to visualize the terrain of the paths. Fitness AR’s co-founders Adam Debreczeni and Eric Florenzano will be joining Mapbox as part of the acquisition to work on AR tech in verticals, including “travel, weather, fitness, sports and gaming,” according to the company. Home.me turns your 2D floorplan drawings into 3D renderings Charles Wong and Aravind Kandiah, both students at the Singapore University of Technology and Design, have built a tool that can turn 2D drawings into 3D renderings. The team uses computer vision to capture a hand-drawn or printed floorplan and converts it into a 3D rendering. Home.me asks you about your location and a few more details about the building you are trying to render, and is able to estimate the square footage and prize of what you’re drawing, too. Once the team’s tools have figured out the floorplan and rendered it, the next step it takes is to visualize it in augmented reality (using the floorplan as its anchor). The Home.me team said they considered using deep learning, but they ran out of time. So like most other machine learning-based tools, they built home.me in Python, using the popular OpenCV library to power its computer vision features. The tool exports its 3D models into Unity models, Wong and Kandiah said, adding that they would continue working on the project despite their current focus on academics.

0
0
1175

article-image-introducing-amazon-neptune-graph-database-service-applications

Savia Lobo

04 Dec 2017

4 min read

Introducing Amazon Neptune: A graph database service for your applications

Savia Lobo

04 Dec 2017

4 min read

Last week was lined up with many exhilarating product releases from Amazon at their AWS re:Invent. Releases pertaining to Machine learning, IoT, Cloud services, databases, and many more were unveiled, which gave an altogether new outlook. Amidst all these, Amazon Web Services announced a fast and a reliable graph database built exclusively for the cloud. Presenting Amazon Neptune! Well, Amazon isn’t entering into our solar system. By Amazon Neptune, it means a fully managed graph database for end users, which makes building and deploying applications a cakewalk. It also allows organizations to identify hidden datasets within a highly connected environment. Let’s explore some of the benefits: It is built exclusively to cater a high-performance service for storing billions of relationships and for running graph queries within a millisecond. Neptune backs the famous graph models such as Property Graph and W3C’s Resource Description Framework (RDF). It also supports their corresponding query languages such as Apache TinkerPop Gremlin and SPARQL. It allows customers to build queries with ease. Also, these queries can be efficiently steered through highly associated datasets. It has availability of more than 99.99%. Neptune continuously monitors data and backs it up to Amazon S3. It enables a point-in-time recovery from physical storage failures. Neptune is fault-tolerant and includes a self-healing storage within the cloud, which means, it can replicate six copies of data across three Availability Zones. It offers scalable database deployment with instance, types ranging from small to large--as per your needs. Neptune is highly secure with different levels of security for each database. It makes use of Amazon VPC for network isolation, AWS Key Management Service (KMS) for encryption at rest, and TLS for encryption in transit. Lastly, known as fully managed, Neptune excellently handles database management tasks, be it software patching, hardware provisioning, configurations, backups, and many more. One can also monitor the performance of their database using Amazon CloudWatch. Neptune in action: Possible use cases In social Networks: With the help of Amazon Neptune, one can easily set up large scale processing of user profiles and interactions in order to build applications for social networks. Neptune offers graph queries that are highly interactive and provides a high throughput for bringing social features within any application. For instance, notifying the user with latest updates from their family or close friends’ zone. In Recommendation Engines : As Neptune features a highly available graph database, it allows one to store relationships between information such as customer interests, purchase history, and so on. It can also draft a query to fire personalized and relevant recommendations. For instance, add a friend recommendation based on your mutual friends. In fraud detection: A graph query can be built which allows easy detection of relationship patterns such as multiple people making use of a similar e-mail id, or people using similar IP address. In this way, Neptune consists of a fully managed service, which helps in detecting possible fraud cases by analyzing buyers who make use of fraudulent e-mail and IP addresses. In knowledge graphs: Neptune allows you to store information within a graph model and makes use of graph queries to let customers easily navigate through information. For instance, a person interested in knowing about The Great Wall of China, can also know the other wonders of the world and where each of them are located. Additionally, it can recommend other places to visit in China, and so on. Thus, with a knowledge graph one can give additional information based on varied topics. In Network/IT operations: By building a graph pattern, Neptune can track the origin of a malicious file i.e the host that spread the malicious file and the host that downloaded it. Though in its infancy, Amazon Neptune can shoot up to great heights as and when it is absorbed by many organizations. Although, it has many competitors, but it would be exciting to see how it paves a way amidst all, and shines as the brightest ‘graph database’ planet.

0
0
13160

article-image-amazon-web-services-announces-aws-iot-analytics

Abhishek Jha

04 Dec 2017

4 min read

AWS IoT Analytics: The easiest way to run analytics on IoT data, Amazon says

Abhishek Jha

04 Dec 2017

4 min read

Until recently, the first thing that came to our mind with Amazon Web Services was that of an infrastructure provider. But things are changing, rightly so in tune with times. The AWS is now into an all out mode to scale up the artificial intelligence ladder, gradually shifting focus towards machine learning, deep learning and data science. Last week it went serverless, and now the cloud leader has added yet another function to its repertoire: AWS IoT Analytics. AWS IoT Analytics provides advanced data analysis of data collected from your IoT devices. It is a fully managed service of AWS IoT, which can be used to cleanse, process, enrich, store, and analyze IoT data at scale. Amazon calls it “the easiest way to run analytics on IoT data.” Announced closely on the heels of re:Invent 2017, the AWS IoT Analytics has been designed specifically for common IoT use cases like predictive maintenance, asset usage patterns, and failure profiling. The platform captures data from devices connected to AWS IoT Core, and filters, transforms, and enriches it before storing it in a time-series database for analysis. “You can set up the service to collect only the data you need from your devices, apply mathematical transforms to process the data, and enrich the data with device-specific metadata such as device type and location before storing the processed data. Then, you can use IoT Analytics to run ad hoc queries using the built-in SQL query engine, or perform more complex processing and analytics like statistical inference and time series analysis,” Amazon said in its release. The new service feature integrates with Amazon Quicksight for visualization of your data and brings the power of machine learning through integration with Jupyter Notebooks. Benefits of AWS IoT Analytics Helps with predictive analysis of data by providing access to pre-built analytical functions Provides ability to visualize analytical output from service Provides tools to clean up data Can help identify patterns in the gathered data Getting Started: Common IoT Analytics Concepts Channel: archives the raw, unprocessed messages and collects data from MQTT topics. Pipeline: consumes messages from channels and allows message processing. Activities: perform transformations on your messages including filtering attributes and invoking lambda functions advanced processing. Data Store: Used as a queryable repository for processed messages. Provide ability to have multiple datastores for messages coming from different devices or locations or filtered by message attributes. Data Set: Data retrieval view from a data store, can be generated by a recurring schedule. This is how it looks like Source: aws.amazon.com First, you create a channel to receive incoming messages. For this, select the Channels menu option and click the Create a channel button (as shown above). It creates a new form where you have to name your channel and give the channel a MQTT topic filter, from which this channel will ingest messages. Your channel is then created once you click the Create Channel button. Once your Channel is created, set up a Data Store to receive and store the messages received on the Channel from your IoT device. Multiple Data Stores can be created for complex solutions. Now that you have your Channel and Data Store stored, connect the two using a Pipeline (in manner something similar to how we created a Channel and Data Store) for the processing and transformation of messages. Additional attributes can be added to create a more robust pipeline, if need be. To use AWS IoT Analytics, all we need now is an IoT rule that sends data to a channel. Choosing the Analyze menu option will bring up the screens to Create a data set. And this is how you set up advanced data analytics for AWS IoT: Source: aws.amazon.com In addition to the ability to collect, visualize, process, query and store large amounts of data generated from AWS IoT connected devices, Amazon said the AWS IoT Analytics service can be used in so many other possibilities such as the AWS Command Line Interface (AWS CLI), the AWS IoT API, language-specific AWS SDKs, and AWS IoT Device SDKs. To learn more about AWS IoT Analytics and to register for the preview, visit the product page.

0
0
18133

article-image-week-glance-25th-nov-1st-dec-top-news-data-science

Aarthi Kumaraswamy

02 Dec 2017

3 min read

Week at a Glance (25th Nov - 1st Dec): Top News from Data Science

Aarthi Kumaraswamy

02 Dec 2017

3 min read

This week Amazon re:invent, AI-powered bots and cryptocurrencies rule the news. Here is a quick rundown of news that happened this week that’s worth your time! News Highlights Data science announcements at Amazon re:invent 2017 IOTA, the cryptocurrency that uses Tangle instead of blockchain, announces Data Marketplace for Internet of Things Cloudera Altus Analytic DB: Modernizing the cloud-based data warehouses What if robots get you a job! Enter Helena, the first artificial intelligence recruiter ABBYY’s AI-powered Real-Time recognition SDK helps developers add ‘instant’ text capture functionality to mobile apps Meet SAM: World’s first artificial intelligence politician! In other News 1st Dec.’ 17 – Headlines Amazon is putting “Alexa for business” Introducing the AIY Vision Kit: Add computer vision to your maker projects Health Wizz unveils blockchain platform to give patients control of health data H2O.ai secures $40 million to democratize artificial intelligence for the enterprise Impetus Technologies to host meetup on anomaly detection techniques using Apache Spark Uptake raises $117M at $2.3B valuation for industrial predictive analytics CrowdRiff releases ‘smart’ visual content marketing platform 30th Nov.’ 17 – Headlines AWS IoT Analytics: Amazon announces dedicated analytics service for IoT data SageMaker: AWS makes it easier to build and deploy machine learning models AWS DeepLens: Amazon unveils deep learning powered wireless video camera for developers Amazon Translate: A neural machine translation service more accurate and scalable Aurora Serverless: AWS announces a serverless database service where users only pay for the processing when the database is actually doing work Mozilla releases open source Speech Recognition Model and voice dataset 29th Nov.’ 17 – Headlines Bitcoin cryptocurrency smashes through $10,000 record, as experts warn of bubbles Bokeh 0.12.11 released Cloudera announces beta version of Cloudera Altus Analytic DB China’s Baidu, Xiaomi enter pact to create smart connected devices under AI and IoT ecosystem Apache Impala upgraded to Top-Level Project Adobe demonstrates future Photoshop tool that uses machine learning to select image subjects DataScience.com now available on AWS Marketplace as a single-click offering 28th Nov.’ 17 – Headlines Researchers aim for unbreakable encryption in quantum computers as new breakthrough drastically increases the speed of current QKD transmission Nuance AI Marketplace for Diagnostic Imaging: Nuance leverages Nvidia’s deep learning for radiology Change Healthcare, Dicom Systems ink strategic partnerships with Google Cloud to apply AI into medical imaging analytics Announcing AWS Machine Learning Research Awards to fund machine learning research Intuit to use AWS as its standard artificial intelligence platform 27th Nov.’ 17 – Headlines Nvidia’s AI processor to be used in GE Healthcare’s medical devices globally EnvoyAI launches with 35 algorithms contributed by 14 newly-contracted artificial intelligence development partners Google may add native dictation support to Chromebooks Lunit Unveils “Lunit INSIGHT,” A New Real-time Imaging AI Platform on the Web at RSNA 2017 Kia Motors America launches AI-powered virtual assistant “Kian” to help customers “Know It All Now” about any Kia Vehicle [box type="info" align="" class="" width=""]To get the latest news updates, expert interviews, and tutorials in data science subscribe to the Datahub and receive a free eBook. To receive real-time updates, follow our Twitter handle @PacktDataHub.[/box]

0
0
1523

article-image-amazon-aws-aurora-serverless-database-service

Abhishek Jha

02 Dec 2017

4 min read

Aurora Serverless: No servers, no instances to set up! You pay for only what you use

Abhishek Jha

02 Dec 2017

4 min read

This could be Amazon’s yet another shot across the bow to Oracle. The undisputed cloud king is well aware its database segment is a small fish in a pond dominated by Oracle. But as more number of enterprises move from on-premise to the cloud, Amazon's database market share could improve. One of the standout announcement from this year’s re:Invent conference was a “serverless database” based on and expanding upon the company’s fully managed Aurora database architecture. Aurora Serverless will let customers create database instances that only run when needed and automatically scale up or down based on demand. If a database isn’t needed at all, it will shut down until it is needed. This way, users will be able to pay by the second for the Aurora Serverless computation that they use – they won’t end up footing the bill for a database sitting idle overnight. In essence, Aurora was itself a pretty good database model in itself, in an environment where the workload was predictable. But the Amazon Web Services (AWS) eventually realized the workloads can be intermittent in some cases, and equally unpredictable at other times as requests may arrive in a span of few minutes or hours per day or per week. This is where the new variant of Aurora comes into the picture. Aurora Serverless has been designed keeping in mind workloads that are highly variable and subject to rapid change. Further, you are paying on a second-by-second basis, for the actual database resources you use. “Because storage and processing are separate, you can scale all the way down to zero and pay only for storage. I think this is really cool,” AWS evangelist Jeff Barr said, describing the serverless model that builds on a clean separation of processing and storage (an intrinsic part of the Aurora architecture). So in use cases when you have a low-volume blog site which is only used for a few minutes several times per day or week, or applications which peak for around 30 minutes each day or several times per year such as the HR budgeting and operational reporting forms, Aurora Serverless auto-scales to the capacity requirements. There could also be cases when the peak of activity is hard to predict, such as a traffic site which may get all of a sudden ‘active’ when it starts raining. Here again, the serverless database meets the needs of peak load, and then scales back down when the surge is over. This is a rather upright feature that has been introduced. Your developers may be using databases during work hours but they certainly don’t need them on nights or weekends. Thanks to Aurora Serverless, your database automatically shuts down when not in use. On the other hand, manually managing database capacity for each application is not a sensible approach – it can take up valuable time and lead to inefficient use of database resources. With Aurora Serverless, you simply create a database endpoint, optionally specify the desired database capacity range, and connect your applications. The endpoint is a simple proxy that routes your queries to a rapidly scaled fleet of database resources. This allows your connections to remain intact without disruptions, even as scaling operations take place behind the scenes. You can also migrate between standard and serverless configurations with a few clicks in the AWS Management Console. Such an on-demand, auto-scaling configuration where the database automatically starts up, shuts down, and scales up or down capacity based on your application's needs, Aurora Serverless is truly how you ‘reinvent’ an Aurora database. 2018 will make it more clear how the new database is actually implemented. Meanwhile, you can sign up for the preview of Aurora Serverless by filling up this form.

0
0
11838

article-image-data-science-announcements-missed-amazon-reinvent-2017

Savia Lobo

01 Dec 2017

8 min read

Data science announcements at Amazon re:invent 2017

Savia Lobo

01 Dec 2017

8 min read

Continuing from our previous post, Amazon’s re:invent 2017 welcomed a lot of new announcements pertaining to three specific domains in data science: Databases, IoT, and Machine Learning. Databases Databases were one of the hot topics for the cloud giant. AWS released the preview of two new database services - Amazon Neptune and Amazon Aurora. Amazon Neptune Preview So what’s Amazon Neptune? A brand new database service from Amazon! It is a fully-managed, quick, and a reliable graph database service, which allows easy development and deployment of applications. It is built exclusively to cater a high-performance service for storing billions of relationships and for running queries within a millisecond. Neptune is highly secure, with inbuilt support for encryption. Since it is fully managed, one should rest assured about the database management tasks. Neptune backs the famous graph models such as Property Graph and W3C's RDF. It also supports their corresponding query languages such as Apache TinkerPop Gremlin and SPARQL. This allows customers to build queries with ease. Also, these queries can efficiently steer through highly associated datasets. Some of its key benefits include: high availability point-in-time recovery continuous backup to Amazon S3 replication across availability zones Amazon Aurora Amazon Aurora announced a preview of two of its new features at the Reinvent: Aurora Multi-Master and Aurora Serverless. Let’s take a brief look at what these two features have in store. Aurora Serverless It allows customers to create database instances that run only when required. This means, databases can be automatically scaled up or down based on demand, which will save a lot of your time. It is designed to handle workloads that are highly variable and are liable to rapid changes. Customers can pay for the resources they use on a second-by-second basis. This will save a lot of your money. The preview of this serverless feature would be available for MySQL-compatible edition of Amazon Aurora. Aurora Multi-Master It allows customers to distribute writes for databases over several datacenters It guarantees customers a zero application downtime to avoid failure of database nodes or availability zones Customers can also leverage a faster write performance from the software At present, Aurora Multi-Master preview is for a single region distribution. However, Amazon expects to put it to work between regions across the global physical infrastructure of AWS, by next year. Internet of Things The next technology Amazon rooted for this year was IoT. Here’s a list of announcements made for IoT applications. AWS IoT Device Management AWS IoT Device Management allows customers to load, set up, monitor, and remotely manage IoT devices securely, throughout the device’s entire lifecycle. Customers can easily log into the AWS IoT console in order to register devices, either individually or in bulk. Further, they can also upload attributes and certificates, and access policies. It also helps customers maintain an inventory, which has all the information related to the IoT devices, such as serial numbers or firmware versions, and so on. Using this information, one can easily track where troubleshooting is required. The devices can be managed individually, in parts, or as an entire fleet. AWS Greengrass ML inference AWS Greengrass ML inference preview lets customers deploy and run ML inferences locally on connected devices bringing in better and intelligent computing capabilities within the IoT devices. Carrying out such an inference on connected devices reduces latency and the cost associated with sending the device data to the cloud for prediction. AWS Greengrass ML inference allows app developers to incorporate machine learning within their devices; with no explicit ML skills required. It allows devices to run ML models locally, get the output, and make smart decisions rapidly; that too without being connected. It also performs explicit ML inference on connected devices without the need for sending the data to the cloud. Data is sent to the cloud only in cases that require more processing. AWS IoT Analytics Preview Re:invent gave us a preview of AWS IoT Analytics, a fully managed IoT analytics service that provides advanced data analysis of data collected from millions of IoT devices. This does not require added management of the hardware or the infrastructure. Let’s look at some of its benefits: Allows customers to have access to pre-built analytical functions, which help them with the predictive analysis of data. Allows customers to visualize analytical output from the service The tools required to clean up data have been provided Aids in identifying patterns within the gathered data In addition to this, the new AWS IoT Analytics feature offers visualization of your data through Amazon Quicksight. It also combines with Jupyter Notebooks to bring in the power of machine learning. To know more about AWS IoT in detail, you can visit the link here. Machine Learning Re:invent introduced a variety of new platforms, tools, and frameworks to leverage Machine Learning. AWS DeepLens Amazon brings an innovative way to get a hands-on deep learning experience for data scientists and developers. Their new AWS DeepLens is an AI-enabled video camera that runs deep learning models locally on the camera to analyze and take action on what it sees. The technology enables developers to build apps while getting practical, hands-on examples for AI, IoT, and serverless computing. The hardware boasts of a 4-megapixel camera that can capture 1080P video and a 2D microphone array. DeepLens has an Intel Atom® Processor with over 100 GLOPS of compute power, for processing deep learning predictions in real time. It also has built-in 8 GB memory for storing pre-trained models and codes. On the software side, AWS DeepLens runs Ubuntu 16.04 and is preloaded with AWS Greengrass Core. Other frameworks such as TensorFlow and Caffe2, can also be used. DeepLens has The Intel® clDNN library and lets developers use AWS Greengrass, AWS Lambda, and other AWS AI and infrastructure services in their app. Amazon Comprehend Tagged as a continuously trained Natural Language Processing (NLP) service, Amazon Comprehend allows customers to analyze texts and find out everything within them. Be it the language used (from Afrikans to Yuroba and 98 more), the entities (people, places, products, etc), sentiments (positive, negative, and so on), key phrases, and much more from within the text provided. Comprehend also has a topic modeling service that extracts topics from a large set of documents for analysis and topic-based grouping. Amazon Rekognition Video With the Rekognition Video, Amazon now has a higher say among similar others in the market. Rekognition Video uses its deep learning capabilities to derive detailed and complete insights from the videos. It allows developers to get detailed information about the objects within the videos. This also includes getting to know the scenes that the videos are set in, the activities happening within them, and so on. It also supports a feature which aids in detecting a person, for instance, it is pre-trained to recognize famous celebrities. It can also track people via a video and can filter out any inappropriate content. In short, it can easily generate metadata from within the video files. Amazon SageMaker An end-to-end Machine learning service that aids developers and data scientists in building, training, and deploying machine learning models easily and quickly, with improved scalability. It consists of three modules: Build - An environment to work with your data, experiment with the algorithms, and have a detailed output visualization. Train - Allows one-click model training and tuning, at high-scale and low cost. Deploy - Provides a managed environment, which allows customers to easily host their models and test them securely for inference, that too with low latency. Amazon SageMaker eliminates machine learning complexities for developers. With Amazon SageMaker, customers can easily build and train their ML models in the cloud. Also, with some additional clicks, customers can also use the AWS Greengrass console in order to transfer the models to devices that they have selected. To have a detailed view of how SageMaker works, visit the link here. Amazon Translate Preview Amazon also unveiled a preview of its 'Translate', a high-quality neural machine translation service. Amazon translate uses advanced machine learning features to enable faster language translation of text-based content. Translate uses neural networks to represent models trained to translate between language pairs and allows development of applications which can allow multilingual user experiences. Organizations and businesses can highly benefit with Translate, as they can now market their products in different regions. This means product consumers can access the websites, the information, and the resources using their language of choice using automated language translations. Additionally, customers can also engage themselves in multiplayer chats, gather information from consumer forums, dive into educational documents, and even obtain reviews about hotels even if those resources are provided in a language they can’t readily understand. Amazon Translate can be used with other Amazon services such as Amazon Polly, Amazon S3, AWS Elastic Search, Amazon Lex, AWS Lambda, and many others. Amazon Translate service is currently in preview and can be used to translate text to and from English and the supported languages. Amazon Transcribe Preview Amazon launched the preview of its Transcribe, an Automatic Speech Recognition (ASR) service. ASR makes it easy for developers to enable the speech-to-text capability into their applications. An amazing feature of Transcribe is, it has an efficient and scalable API, saving developers from the expensive processes of manual transcription. One can also analyze audio files stored on Amazon Simple Storage Service (S3) in different formats such as WAV, MP3, Flac, and so on. In fact, one can get detailed transcriptions along with the timestamps for each word, and the deduced punctuation.

0
0
13591

article-image-handpicked-weekend-reading-1st-dec-2017

Aarthi Kumaraswamy

01 Dec 2017

1 min read

Handpicked for your weekend Reading - 1st Dec 2017

Aarthi Kumaraswamy

01 Dec 2017

1 min read

Expert in Focus: Sebastian Raschka On how Machine Learning has become more accessible 3 Things that happened this week in Data Science News Data science announcements at Amazon re:invent 2017 IOTA, the cryptocurrency that uses Tangle instead of blockchain, announces Data Marketplace for Internet of Things Cloudera Altus Analytic DB: Modernizing the cloud-based data warehouses Get hands-on with these Tutorials Building a classification system with logistic regression in OpenCV How to build a Scatterplot in IBM SPSS Do you agree with these Insights & Opinions? Highest Paying Data Science Jobs in 2017 5 Ways Artificial Intelligence is Transforming the Gaming Industry 10 Algorithms every Machine Learning Engineer should know

0
0
17938

Packt Editorial Staff

01 Dec 2017

6 min read

1st Dec.' 17 - Headlines

Packt Editorial Staff

01 Dec 2017

6 min read

Google's AIY Vision Kit, Amazon's Alexa for Business, and more in today's top stories in data science news. Amazon Web Services in news Amazon is putting "Alexa for business" Amazon Web Services has announced a new initiative to get companies use Alexa in the office. Under the plan, the virtual assistant will help employees launch conference calls, organize room bookings, and even discuss their expenses. With the new scheme, Alexa for Business, companies will be given the tools to manage a fleet of Alexa-enabled devices. Admins will be able to enroll users, enable and disable skills, and connect Alexa to their conferencing equipment. They’ll also be able to build their own apps for the assistant, with Amazon suggesting functions like helping with directions around the office, reporting problems with equipment, and ordering new supplies. Users will also be able to access their company’s apps from home devices, checking what’s on their office calendar and remotely joining meetings. Alexa for Business is being seen as a direct competition to other virtual assistants like Apple Siri, Google Assistant, or Microsoft Cortana. DigitalGlobe to leverage AWS suite of machine learning capabilities DigitalGlobe has migrated its entire 100-petabyte imagery library to Amazon Web Services, thus giving its customers instant access to Amazon's vast library of geospatial images. DigitalGlobe's sister division Radiant Solutions and its partner ecosystem are also leveraging AWS’s frameworks and tools to build machine learning applications that allow their customers to incorporate valuable geospatial information extracted from commercial satellite imagery into their workflows. “Few companies work with the sheer volume of data that DigitalGlobe does. When working at this volume, it’s nearly impossible to scale and rapidly innovate without the cloud,” said AWS VP Teresa Carlson said, adding that DigitalGlobe was the first customer to use AWS Snowmobile – AWS’s Exabyte-scale data transfer service that uses a 45-foot long ruggedized shipping container pulled by a semi-trailer truck – to move their massive image library to AWS. Google in news Introducing the AIY Vision Kit: Add computer vision to your maker projects Google's AIY Team has announced its next project: the AIY Vision Kit — an affordable, hackable, intelligent camera. The AIY Vision Kit is easy to assemble and connects to a Raspberry Pi computer. Based on user feedback, this new kit is designed to work with the smaller Raspberry Pi Zero W computer and runs its vision algorithms on-device so there's no cloud connection required. The kit materials list includes a VisionBonnet, a cardboard outer shell, an RGB arcade-style button, a piezo speaker, a macro/wide lens kit, flex cables, standoffs, a tripod mounting nut and connecting components. "For those of you who have your own models in mind, we've included the original TensorFlow code and a compiler. Take a new model you have (or train) and run it on the the Intel® Movidius™ MA2450," Google said adding that users can extend the kit to solve their real world problems. A blockchain for health data Health Wizz unveils blockchain platform to give patients control of health data Health Wizz announced the upcoming launch of its blockchain-based solution designed to address the mounting problem of electronic health records, and provide patients more power over their own health information. Using the Health Wizz platform, every patient would become the arbiter of his or her own medical records. "Each time medical records are produced – by a doctor’s appointment, ER visit, hospital intake or self-reporting app – the platform would standardize them using a specification known as Fast Healthcare Interoperability Resources," the company said. "Once done, the records are secured on the user’s own mobile devices in an encrypted space accessible only by that user’s own private cryptographic keys." To power its system, Health Wizz today announced a pre-sale of its digital token, which will run from Nov. 30 until February 2018. Proceeds will be used to develop the platform further and augment already existing venture capital investments. The formal launch of the platform will happen in March 2018. Other data science news H2O.ai secures $40 million to democratize artificial intelligence for the enterprise H2O.ai announced it has completed a $40 million Series C round of funding led by Wells Fargo and NVIDIA with participation from New York Life, Crane Venture Partners, Nexus Venture Partners and Transamerica Ventures, the corporate venture capital fund of Transamerica and Aegon Group. The Series C round brings H2O.ai's total amount of funding raised to $75 million. The new investment will be used to further democratize advanced machine learning and for global expansion and innovation of Driverless AI, an automated machine learning and pipelining platform that uses “AI to do AI.” H2O’s signature community conference, H2O World will take place on December 4-5, 2017 at the Computer History Museum in Mountain View, Calif. Impetus Technologies to host meetup on anomaly detection techniques using Apache Spark Big data company Impetus Technologies announced it will host a complimentary meetup "Anomaly Detection Techniques and Implementation Using Apache Spark" on Tuesday, December 5, 2017 from 6-8 pm Pacific time at the Larkspur Landing Hotel in Milpitas, Calif. The company said that space is limited for the event, and interested data scientists, developers and information technology (IT) professionals are asked to reserve a seat at the complimentary event by emailing at events@impetus.com. In the meetup, the StreamAnalytix team from Impetus will share insights on choosing the right anomaly detection techniques and demonstrate real-world use cases for finding variances in network traffic and financial transactions. Uptake raises $117M at $2.3B valuation for industrial predictive analytics Uptake, a SaaS startup that uses machine learning to read and understand how machines are working, and also anticipate when they may break down or need other attention, has closed a Series D round of $117 million at a post-money valuation of $2.3 billion, led by Baillie Gifford, with participation also from existing investors Revolution Growth and GreatPoint Ventures. It brings the total funding to over $250 million. “We’re on a growth trajectory now where there is virtually nothing standing in our way from being the predictive analytics market leader across every heavy industry, from oil & gas to mining and beyond,” said Uptake Co-founder and CEO Brad Keywell in a statement. CrowdRiff releases 'smart' visual content marketing platform Visual marketing software provider CrowdRiff said it has now processed over 500 million images for over 300 travel brands, and is releasing new visual marketing capabilities powered by artificial intelligence and machine learning. CEO Dan Holowack announced this new release at the DTTT Global conference in Brussels, Belgium, where he is co-presenting a session, "Making the Shift to Visual Marketing," together with Amber King, Director, U.S. Marketing at Colorado Tourism Office. "The volume of available visual content is larger than ever before, and finding the perfect visuals that meet both brand and performance goals is a time-consuming and largely manual process," Holowack said, adding that CrowdRiff's latest release addresses "the most common problems marketing teams face when producing visual content, at every stage of the visual content lifecycle."

0
0
2109

article-image-amazon-sagemaker-machine-learning-service

Sugandha Lahoti

01 Dec 2017

3 min read

Amazon unveils Sagemaker: An end-to-end machine learning service

Sugandha Lahoti

01 Dec 2017

3 min read

Machine Learning was one of the most talked about topic at the Amazon’s re:invent this year. In order to make machine learning models accessible to everyday users, regardless of their expertise level, Amazon Web services launched an end-to-end machine learning service – Sagemaker. Amazon Sagemaker allows data scientists, developers, and machine learning experts to quickly build, train, and deploy machine learning models at scale. The below image shows the process adopted by Sagemaker to aid developers in building ML models. Source: aws.amazon.com Model Building Amazon SageMaker makes it easy to build ML models by easy training and selection of best algorithms and frameworks for a particular model. Amazon Sagemaker has zero-setup hosted Jupyter notebooks which makes it easy to explore, connect, and visualize the training data stored on Amazon S3. These notebook IDEs are runnable on either general instance types or GPU powered instances. Model Training ML models can be trained by a single click in the Amazon SageMaker console. For training the data, Sagemaker also has a provision for moving training data from Amazon RDS, Amazon DynamoDB, and Amazon Redshift into S3. Amazon Sagemaker is preconfigured to run TensorFlow and Apache MXNet. However, developers can use their own frameworks and also create their own training with Docker containers. Model Tuning and Hosting Amazon Sagemaker has a model hosting service with HTTPs endpoints. These endpoints can invoke real-time inferences, support traffic, and simultaneously allow A/B Testing. Amazon Sagemaker can automatically tune models to achieve high accuracy. This makes the training process faster and easier. Sagemaker can automate the underlying infrastructure and allows developers to easily scale to train models at petabyte scale. Model Deployment After training and tuning come the deployment phase. Sagemaker deploys the models on an auto-scaling cluster of Amazon EC2 instances, for running predictions on new data. These high-performance instances are spread across multiple availability zones. According to the official product page, Amazon Sagemaker has multiple use cases. One of them being Ad targeting, where Amazon Sagemaker can be used with other AWS services to help build, train, and deploy ML models for targeting online ads, optimize return on ad spend, customer segmentation, etc. Another interesting use case of Sagemaker is how it can train recommender systems within its serverless, distributed environment which can be hosted easily in low-latency, auto-scaling endpoint systems. Sagemaker can also be used for building highly efficient Industrial IoT and ML models to predict machine failure or for maintenance scheduling. As of now, Amazon Sagemaker is free for developers for the first two months. Each month developers are provided with 250 hours of t2.medium notebook usage, 50 hours of m4.xlarge usage for training, and 125 hours of m4.xlarge usage for hosting. After the free period, the pricing would vary by region and customers would be billed per-second for instance usage, per-GB of storage, and per-GB of Data transfer into and out of the service. AWS Sagemaker provides an end-to-end solution for the development of machine learning applications. The ease and flexibility offered by AWS Sagemaker could be harnessed by developers to solve several business-related problems.

0
0
18162

Packt Editorial Staff

30 Nov 2017

5 min read

30th Nov.' 17 - Headlines

Packt Editorial Staff

30 Nov 2017

5 min read

Amazon's latest announcements from AWS re:Invent 2017, Mozilla's Speech Recognition Model, and IOTA's Data Marketplace in today's trending stories around artificial intelligence (AI) and data science news. re:Invent 2017 in data science news AWS IoT Analytics: Amazon announces dedicated analytics service for IoT data To drive insights out of the huge data IoT generates, Amazon Web Services launched a dedicated IoT analytics service called AWS IoT Analytics. The IoT analytics tool lets users gather, store and then query the messages coming from the IoT sensors, while extracting specific sets of data on a regular basis. “With the AWS IoT Analytics service, you can process messages, gather and store large amounts of device data, as well as, query your data. Also, the new AWS IoT Analytics service feature integrates with Amazon Quicksight for visualization of your data and brings the power of machine learning through integration with Jupyter Notebooks,” the company said in its blog post. SageMaker: AWS makes it easier to build and deploy machine learning models AWS SageMaker is an end-to-end machine learning service making it easier for developers to build, train, and deploy machine learning models. With a goal to simplify machine learning and democratize deep learning models, SageMaker starts with an authoring component for data cleaning and processing, and offers scalable model hosting and training. Other than the built-in algorithms and one-click training, SageMaker has a new feature Hyper Parameter Optimization, using which developers can check a box at the beginning of tuning their model, and it will find the best parameters for their machine learning model. AWS DeepLens: Amazon unveils deep learning powered wireless video camera for developers Amazon has announced AWS DeepLens, a wireless AI camera that uses deep learning to run real-time computer vision models. DeepLens consists of an HD (1080p) video camera and an Intel Atom X5 processor that can process over 100 billion deep learning operations per second. The camera runs on Ubuntu 16.04 and is preloaded with the Greengrass Core. Amazon said it will pair DeepLens with SageMaker, another new platform being announced at re:Invent for developing and distributing machine learning algorithms. "Developers can extend these tutorials to create their own custom, deep learning-powered projects with AWS Lambda functions," writes Amazon. "AWS DeepLens could be programmed to recognize the numbers on a license plate and trigger a home automation system to open a garage door, or AWS DeepLens could recognize when the dog is on the couch and send a text to its owner." DeepLens is priced at $250 and includes wireless connectivity; it has a 2D microphone array and 8GB of onboard storage. It also features USB and microHDMI ports for connecting to a PC to export data. Amazon Translate: A neural machine translation service more accurate and scalable Amazon has announced a neural machine translation service named Amazon Translate, to deliver fast, high-quality, and affordable language translation. Amazon Translate will use machine learning and deep learning models to deliver more accurate and more natural sounding translation than traditional statistical and rule-based translation algorithms. Apart from accuracy, it is highly scalable, meaning it easily translates large volumes of text efficiently, localizing websites and applications for international users. To preview Amazon Translate, click here. Aurora Serverless: AWS announces a serverless database service where users only pay for the processing when the database is actually doing work Amazon’s AWS cloud computing division has announced a new database service making it easier and cheaper to quickly launch relational databases that don’t need to process data continuously. Aurora Serverless users only pay for the processing when the database is actually doing work. Customers no longer have to provision or manage database capacity, and the database automatically starts, scales, and shuts down based on application workload. Customers simply create an endpoint through the AWS Management Console, specify the minimum and maximum capacity needs of their application, and Amazon Aurora handles the rest. Mozilla in data science news Mozilla releases open source Speech Recognition Model and voice dataset Mozilla has announced the initial release of its open source speech recognition model. The system is based on research from Baidu’s Deep Speech project, and was trained using a data set of almost 400,000 voice recordings from over 20,000 people. Mozilla said its system offers a word error rate of about 6.5 percent, not quite as good as human beings at recognizing speech, but still pretty close. The company is also releasing the world’s second largest publicly available voice dataset, which was contributed to by nearly 20,000 people globally. No blockchain, but Tangle.. Cryptocurrency IOTA announces Data Marketplace for Internet of Things, uses new technique Tangle to distribute and decentralize its ledger Upcoming cryptocurrency startup IOTA has launched a new micropayment-based Data Marketplace that's powered by distributed ledger technology. The publicly data marketplace is specially designed for the Internet of Things. The initiative has gathered participation from more than 20 global organizations, including Deutsche Telekom, Bosch, Microsoft, PricewaterhouseCoopers, Accenture and Fujitsu. Research groups from leading universities in the world are also working on it. The marketplace aims to give connected devices the ability to securely transfer, buy and sell datasets using a “tangle” based distributed ledger technology for storing transactions. In contrast, most other cryptocurrencies including Bitcoin and Ether use blockchain technology. The IOTA Foundation claims that it is this decentralized permissionless ledger, where the data will be hosted, that will eventually ensure that the data being sold on IOTA's marketplace is tamper-proof. Co-founders Dominik Schiener and David Sønstebø believe Tangle offers free transactions and much better scaling opportunities.

0
0
1860

article-image-iota-data-marketplace-for-internet-of-things-using-tangle-not-blockchain

Abhishek Jha

30 Nov 2017

3 min read

IOTA, the cryptocurrency that uses Tangle instead of blockchain, announces Data Marketplace for Internet of Things

Abhishek Jha

30 Nov 2017

3 min read

Up-and-coming cryptocurrency startup IOTA has partnered with more than 20 corporate behemoths such as Deutsche Telekom, Bosch, Microsoft, PricewaterhouseCoopers, Accenture and Fujitsu to build a reliable Data Marketplace for data sharing and monetizing. Several research groups from universities around the world are also involved in the project. According to IOTA, over 2.5 quintillion bytes of information are generated on a daily basis, but almost 99% is lost because there is no safe place to exchange and share this data securely. “Any kind of data can be monetized,” said David Sønstebø, who co-founded the Internet-of-Things based cryptocurrency, “If you have a weather station collecting wind, temperature, humidity, and barometric data, for instance, you can sell that to an entity that is doing climatic research.” The marketplace aims to give connected devices the ability to securely transfer, buy and sell datasets using a “tangle” based distributed ledger technology. Other cryptocurrencies such as Bitcoin and Ether use blockchain technology in contrast. “IOTA is kind of the first distributed ledger that goes beyond the blockchain,” Sønstebø said, “We got rid of the blocks and we got rid of the chains, which has resulted in getting rid of the major pain points or limitations of the blockchain such as fees, scalability, and centralization.” As soon as data is put onto IOTA’s decentralized ledger, it is distributed to countless nodes or the computers that connect to the blockchain network, ensuring that it is impossible to tamper with the data, Sønstebø claimed. Basically, Tangle is a form of Directed Acyclic Graph (DAG), a complex data structure where the devices on the network build consensus through the web of connections between transactions, as they randomly verify each other’s transactions. This method of verification means there’s no central ledger, and there’s no need for miners to power the network. It is this decentralized permissionless ledger, where the data will be hosted, that may possibly ensure that the data being sold on IOTA’s marketplace is tamper-proof. Since computing power in the Tangle grows as the network grows, IOTA is promising free, fast transactions. The marketplace demo will run until January, IOTA stated, adding that they will release a series of blog posts and case studies to highlight how companies can use the technology and benefit from it. While the technology is still new, it may act as a catalyst for a whole new paradigm of research, artificial intelligence, and democratization of data. In tangle, IOTA has used an interesting technique to distribute and decentralize its ledger while addressing the core drawbacks of blockchain system. In their white paper released in October, IOTA had even said the tangle “naturally succeeds the blockchain as its next evolutionary step.”

0
0
11687

article-image-cloudera-altus-analytic-db-data-warehouse-cloud-service

Abhishek Jha

30 Nov 2017

4 min read

Cloudera Altus Analytic DB: Modernizing the cloud-based data warehouses

Abhishek Jha

30 Nov 2017

4 min read

Cloudera has announced the beta release of Cloudera Altus Analytic DB, a data warehouse cloud service that brings the warehouse to the data through a unique cloud-scale architecture that eliminates costly data movement. Built on the Cloudera Altus Platform-as-a-Service (PaaS) foundation, Altus Analytic DB delivers instant self-service BI and SQL analytics to users in a reliable and secure environment. In addition, by leveraging the Shared Data Experience (SDX), the same data and catalog is accessible for analysts, data scientists, data engineers, and others using the tools they prefer – SQL, Python, R – without any data movement. For enterprises, challenges with existing analytic environments have resulted in a number of limitations for both business analysts and IT. Because of resource constraints, critical reporting and SLAs are given priority while limiting self-service access for other queries and workloads. To support additional workloads and access beyond SQL, data silos have proliferated in organizations, resulting in inefficiencies in managing the multiple data copies, difficulties in applying consistent security policies, and governance issues. In turn, business users are struggling to analyze data across these silos and there is limited ability to collaborate with groups including data scientists and data engineers. Cloudera Altus Analytic DB removes those limitations through the speed and scale of the cloud. Central to Altus Analytic DB is its unique architecture that brings the warehouse to the data, enabling direct and iterative access to all data in cloud object storage. This simple, yet powerful design could deliver dramatic benefits for IT, business analysts, as well as non-SQL users. IT benefits from simple PaaS operations to easily and elastically provision limitless isolated resources on-demand, with simple multi-tenant management and consistent security policies and governance. Business analysts get immediate self-service access to all data without risking critical SLAs, and with predictable performance no matter how many other reports or queries are running. Additionally, they can continue to leverage existing tools and skills, including integrations with leading BI and integration tools such as Arcadia Data, Informatica, Qlik, Tableau, Zoomdata, and others. With no need to move data into the database, shared data and associated data schemas and catalog are always available for iterative access beyond just SQL, so data scientists, data engineers, and others can seamlessly collaborate. Senior vice president Charles Zedlewski said Cloudera is helping its customers "modernize their data warehouse both on-premises and in cloud environments" with the unique architecture. "With no need to move data into the Cloudera Altus platform, users can quickly spin up clusters for business reporting, exploration, and even use Altus Data Engineering to deploy data pipelines, all over the same data and Shared Data Experience without impacting performance or SLAs," he said, stressing on how the Cloudera Altus Analytic DB is making it easier for analysts to get dedicated, self-service access for BI and SQL analytics, all with an "enterprise focus." Key Capabilities of Cloudera Altus Analytic DB Cloudera Altus Analytic DB, built with the leading high-performance SQL query engine, Apache Impala (now graduated to a Top-Level Project), puts the full power and flexibility of a modern, cloud-powered analytic database in the hands of businesses quickly, easily, reliably, and securely: Brings the data warehouse to the data: No need to move data into the database – saving time and simplifying IT management and security. Delivers instant analytics: With no pre-processing or moving data, users can operate on data immediately and iterate – again and again – for faster time-to-insights. Ensures data consistency: Everyone works with the same data, schemas, and structures – business analysts, financial analysts, data scientists, data engineers, anyone. Goes beyond SQL: Flexible self-service access lets users collaborate over shared data, using the languages and tools they prefer to work with - SQL, Python, R, and more. Built with cloud scale: Easy elasticity and performance for fast, adaptable, cost-effective analytics. The initial beta of Cloudera Altus Analytic DB will be available on Amazon Web Services (AWS). Sign up here to join the beta.

0
0
9919

article-image-helena-artificial-intelligence-recruitment-headhunter

Abhishek Jha

29 Nov 2017

4 min read

What if robots get you a job! Enter Helena, the first artificial intelligence recruiter

Abhishek Jha

29 Nov 2017

4 min read

She is Helena. She is virtual – a robot, yes – a matchmaker that uses AI and machine learning to connect the right candidate for the right job opportunity. The bot goes further. After scouting the best candidates, and matching them to available roles, she (that’s the gender inventors decided upon) approaches them on behalf of the organizations. In other words, a full-fledged corporate headhunter driven by artificial intelligence. Or you may call it a simplified job-hunting tool from the other end. In essence, the AI-powered virtual assistant plays the dual role, serving not only as a company headhunter but also as job seeker’s agent, meaning both sides do not have to search for each other. As an AI agent, Helena allows professionals to discreetly and ‘passively’ receive job opportunities from companies. Once the candidate shows interest, she refers them to the company and ensures they respond as quickly as possible. It has taken the AI startup Woo over two years to build Helena, putting together what they call a 'dream team' of the best recruiters and data scientists from industry-leading companies such as Google and Facebook, other than the top algorithm engineers from the market. And while it takes real stuff to train the ‘unreal’ headhunter robot how to think and make decisions like human recruiters, Helena has got smarter over time through employer feedback and machine learning. According to the company, she is out-performing human recruiters in the quality of her match-making and the speed of her performance. She is “constantly calibrating and fine-tuning her decision-making based on the client’s dynamic needs and feedback.” “If you think about an interview, it’s an outcome of a lack of information on both sides,” Woo CEO and founder Liran Kotzer says. “But if there’s a machine that knows everything—like a god—knows about your past experiences, about your projects, your culture—the machine is going to tell you that there’s a perfect fit and both parties won’t question it.” So are the Helenas going to totally disrupt the future of recruitment? There are both sides of the argument. But you can’t take away the fact that the AI assistant is infinitely scalable. Unlike her human counterparts, Helena has the capacity to handle an unlimited amount of candidates. Free from individual views and biases! That qualifies as fair – up from fair enough – in terms of selection criteria. Here, your candidature is considered based on scientific algorithms considering your past success, trends, CTQs, and metric-based relevant data sets. It has a potential of bringing a new era of transparency. “Helena turns the tables on today’s labor intensive and largely unscientific recruitment process. Unlike using an expensive headhunter to manually source and screen a limited number of candidates for specific jobs, Helena uses data science to hire,” Kotzer adds. Connecting possible employees with their would-be employers – without the intervention of either – is too much of an automated concept. But the start has shown remarkable accuracy in the matchmaking. Woo claims its headhunting software has a 52 percent success rate of interested candidates accepting job interviews. That is nearly twice that of human recruiters, isn’t it? For people on both sides of the interview table, the hiring process is tedious. There are stacks of resumes, cover letters, supplementary documents, LinkedIn profiles, and countless job interviews. It makes a definite sense to automate the repetitive tasks if we have the requisite analytical insights for the data leading to optimized job matches. That prepares the perfect ground for artificial intelligence to take over. You would not call Helena ‘just another bot’ for at least attempting to solve the age-old problem of recruitment bias.

0
0
13953

Tech News - Data

PyTorch 0.3.0 releases, ending stochastic functions

5th Dec.' 17 - Headlines

4th Dec.' 17 - Headlines

Introducing Amazon Neptune: A graph database service for your applications

AWS IoT Analytics: The easiest way to run analytics on IoT data, Amazon says

Week at a Glance (25th Nov - 1st Dec): Top News from Data Science

Aurora Serverless: No servers, no instances to set up! You pay for only what you use

Data science announcements at Amazon re:invent 2017

Handpicked for your weekend Reading - 1st Dec 2017

1st Dec.' 17 - Headlines

Trending Topics

Amazon unveils Sagemaker: An end-to-end machine learning service

30th Nov.' 17 - Headlines

IOTA, the cryptocurrency that uses Tangle instead of blockchain, announces Data Marketplace for Internet of Things

Cloudera Altus Analytic DB: Modernizing the cloud-based data warehouses

What if robots get you a job! Enter Helena, the first artificial intelligence recruiter

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access