Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

Tech News - Data

1209 Articles
article-image-dr-brandon-explains-nlp-natural-language-processing-jon
Aarthi Kumaraswamy
25 Oct 2017
5 min read
Save for later

Dr.Brandon explains NLP (Natural Language Processing) to Jon

Aarthi Kumaraswamy
25 Oct 2017
5 min read
[box type="shadow" align="" class="" width=""] Dr.Brandon: Welcome everyone to the first episode of 'Date with data science'. I am Dr. Brandon Hopper, B.S., M.S., Ph.D., Senior Data Scientist at BeingHumanoid and, visiting faculty at Fictional AI University.  Jon: And I am just Jon - actor, foodie and Brandon's fun friend. I don't have any letters after my name but I can say the alphabets in reverse order. Pretty cool, huh! Dr.Brandon: Yes, I am sure our readers will find it very amusing Jon. Talking of alphabets, today we discuss NLP. Jon: Wait, what is NLP? Is it that thing Ashley's working on? Dr.Brandon: No. The NLP we are talking about today is Natural Language Processing, not to be confused with Neuro-Linguistic Programming.   Jon: Oh alright. I thought we just processed cheese. How do you process language? Don't you start with 'to understand NLP, we must first understand how humans started communicating'! And keep it short and simple, will you? Dr.Brandon: OK I will try my best to do all of the above if you promise not to doze off. The following is an excerpt from the book Mastering Machine Learning with Spark 2.x by Alex Tellez, Max Pumperla and Michal Malohlava. [/box]   NLP helps analyze raw textual data and extract useful information such as sentence structure, sentiment of text, or even translation of text between languages. Since many sources of data contain raw text, (for example, reviews, news articles, and medical records). NLP is getting more and more popular, thanks to providing an insight into the text and helps make automatized decisions easier. Under the hood, NLP is often using machine-learning algorithms to extract and model the structure of text. The power of NLP is much more visible if it is applied in the context of another machine method, where, for example, text can represent one of the input features. NLP - a brief primer Just like artificial neural networks, NLP is a relatively "old" subject, but one that has garnered a massive amount of attention recently due to the rise of computing power and various applications of machine learning algorithms for tasks that include, but are not limited to, the following: Machine translation (MT): In its simplest form, this is the ability of machines to translate one language of words to another language of words. Interestingly, proposals for machine translation systems pre-date the creation of the digital computer. One of the first NLP applications was created during World War II by an American scientist named Warren Weaver whose job was to try and crack German code. Nowadays, we have highly sophisticated applications that can translate a piece of text into any number of different languages we desire!‌ Speech recognition (SR): These methodologies and technologies attempt to recognize and translate spoken words into text using machines. We see these technologies in smartphones nowadays that use SR systems in tasks ranging from helping us find directions to the nearest gas station to querying Google for the weekend's weather forecast. As we speak into our phones, a machine is able to recognize the words we are speaking and then translate these words into text that the computer can recognize and perform some task if need be. Information retrieval (IR): Have you ever read a piece of text, such as an article on a news website, for example, and wanted to see similar news articles like the one you just read? This is but one example of an information retrieval system that takes a piece of text as an "input" and seeks to obtain other relevant pieces of text similar to the input text. Perhaps the easiest and most recognizable example of an IR system is doing a search on a web-based search engine. We give some words that we want to "know" more about (this is the "input"), and the output are the search results, which are hopefully relevant to our input search query. Information extraction (IE): This is the task of extracting structured bits of information from unstructured data such as text, video and pictures. For example, when you read a blog post on some website, often, the post is tagged with a few keywords that describe the general topics about this posting, which can be classified using information extraction systems. One extremely popular avenue of IE is called Visual Information Extraction, which attempts to identify complex entities from the visual layout of a web page, for example, which would not be captured in typical NLP approaches. Text summarization (darn, no acronym here!): This is a hugely popular area of interest. This is the task of taking pieces of text of various length and summarizing them by identifying topics, for example. In the next chapter, we will explore two popular approaches to text summarization via topic models such as Latent Dirichlet Allocation (LDA) and Latent Semantic Analysis (LSA). If you enjoyed the above excerpt from the book Mastering Machine Learning with Spark 2.x by Alex Tellez, Max Pumperla, and Michal Malohlava, check out the book to learn how to Use Spark streams to cluster tweets online Utilize generated models for off-line/on-line prediction Transfer learning from an ensemble to a simpler Neural Network Use GraphFrames, an extension of DataFrames to graphs, to study graphs using an elegant query language Use K-means algorithm to cluster movie reviews dataset and more
Read more
  • 0
  • 0
  • 7270

article-image-trending-datascience-news-25th-oct-17-headlines
Packt Editorial Staff
25 Oct 2017
5 min read
Save for later

25th Oct.' 17 - Headlines

Packt Editorial Staff
25 Oct 2017
5 min read
Announcements from Neo4j's GraphConnect conference, Microsoft's updates on Windows Dev Center, MapR's launch of MapR Data Science Refinery, and more in today's top data science news. Graph database Neo4j in News New Neo4j platform gives developers a set of tools for building enterprise graph applications Graph database leader Neo4j has launched a new platform for developers to build graph-based applications using a common set of services. Breaking the announcement at its GraphConnect conference in New York, Neo4j said the new platform will help the graph databases connect to various enterprise systems allowing developers to build applications more quickly. Until now, customers were forced to create their own architecture to manually connect to these systems. Neo4j releases Cypher for Apache Spark At its ongoing GraphConnect conference, Neo4j announced a new initiative to support the design and execution of graph queries in the Apache Spark environment. Neo4j released an early version of Cypher for Apache™ Spark® (CAPS) language toolkit to the openCypher project. This contribution will allow big data analysts to incorporate graph querying in their workflows, making it easier to bring graph algorithms to bear, dramatically broadening how they reveal connections in their data.  Developers of Spark applications now join the users of Neo4j, SAP HANA, Redis Graph and AgensGraph, among others, in gaining access to Cypher, the leading declarative property graph query language. This also expands the tooling available to any developer, under Apache 2.0 licenses from the openCypher project. Neo4j 3.3 released with improved performance and security Neo4j has announced its latest release – Neo4j 3.3. With Neo4j 3.3 write performance has improved with on average 50% compared to Neo4j 3.2, making it possible to ingest more data in less time. Bulk writes at initial graph creation reduces the memory footprint by up to 40%. The new Cypher Slotted Runtime results in faster queries while using one third of the memory compared to the Neo4j 3.2 Cypher Runtime. On security front, Neo4j 3.3 introduces new support for intra-cluster encryption, including multi-DC cluster communication encryption. The new version also brings new kernel improvements as it now allows key configuration parameters to be changed on the fly, without needing to recycle a database instance. Microsoft in News Microsoft brings real time health reporting to Windows Dev Center Microsoft will now offer near real time health reporting in its Windows Dev Center, thus helping developers to quickly fix stability issues in their apps. If you have joined the Dev Center Insider Program, the Health report’s 72H view in Dev Center now shows data for crashes, hangs, memory failures and JavaScript exceptions within minutes of those events. Previously, this data was available only after several hours. Microsoft said it will soon bring this feature to all Dev Center users. Microsoft adds new feature called Review Insights in Windows Dev Center dashboard In Windows Dev Center, under the Review Reports, Microsoft is introducing a new feature called Review insights. Review insights uses machine learning to classify new app reviews, even non-English reviews, into one of 12 pre-defined categories. This will help developers to quickly understand customer sentiment by filtering their app’s reviews by category. Developers can also apply additional filters, such as OS version or rating, to further isolate issues and find actionable feedback. MapR in News MapR launches MapR Data Science Refinery to leverage artificial intelligence MapR has unveiled MapR Data Science Refinery, a new solution that provides data scientists an easy way to access and analyze all data in-place, to collaborate, build and deploy machine learning models on the MapR Converged Data Platform. Using a developer friendly notebook and a wide range of open source data science tools that integrate directly with the MapR Platform, the MapR Data Science Refinery is easy to deploy using a secure, persistent, and extensible container that can be distributed to many data science teams across multi-tenant environments. Other data science News SQLite 3.21.0 released SQL database engine SQLite has released its version 3.21.0 where it added several new features and enhanced the running functionalities. The new version also contains a number of bug fixes. Apache Software Foundation upgrades Apache PredictionIO to Top-Level Project Apache PredictionIO, an open source platform donated last year by Salesforce, has been promoted by the Apache Software Foundation (ASF) from the Apache Incubator to Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles. Apache PredictionIO focuses on enabling developers to quickly develop and deploy production-ready Machine Learning pipelines. The project features an engine template gallery, where developers can pick a template, and quickly ramp up a complete setup for their Machine Learning use cases. Apache PredictionIO is in use at ActionML, BizReach, LiftIQ, Pluralsight, and Salesforce, among others. Baidu announces Deep Voice 3 project which can learn to imitate almost every accent Baidu has launched the third version of Deep Voice which can dramatically shorten the learning time and support a higher number of language accents. Deep Voice 3 can learn as many as 2500 voices by processing the data in just 30 minutes. The Deep Voice projects use deep learning techniques to convert text to speech. Google has a similar project called WaveNet through its DeepMind unit. Baidu said the future versions may use even bigger data set and master up to 10,000 voices. Amazon announces general availability of Amazon Aurora with PostgreSQL Compatibility Amazon Aurora with PostgreSQL Compatibility is now generally available, Amazon announced on its official blog post. It is compatible with PostgreSQL 9.6.3 and scales automatically to support up to 64 TB of storage, with 6-way replication behind the scenes to improve performance and availability. Amazon Aurora with PostgreSQL Compatibility is fully managed and can perform up to 3x the throughput users otherwise get running PostgreSQL on their own.
Read more
  • 0
  • 0
  • 1534

article-image-trending-datascience-news-24th-oct-17-headlines
Packt Editorial Staff
24 Oct 2017
4 min read
Save for later

24th Oct.' 17 - Headlines

Packt Editorial Staff
24 Oct 2017
4 min read
Cray Supercomputers on Microsoft Azure, Blockchain services on Azure Government, and more in today's top data science news. Microsoft Azure in News Cray bringing its supercomputers to Microsoft Azure Microsoft has entered into an exclusive partnership with Cray to provide its customers access to supercomputing capabilities in Azure. Under the partnership, customers can get a dedicated Cray XC or CS series supercomputers in Azure to run HPC and AI applications alongside their other cloud workloads directly on the Azure network. As Cray systems easily integrate with Azure Virtual Machines, Azure Data Lake storage, the Microsoft AI platform, and Azure Machine Learning services for rich workflows and collaboration, customers can solve their toughest challenges in climate modeling, precision medicine, energy, manufacturing, and other scientific research. Microsoft adds blockchain services to Azure Government, pushes forward Coco framework Microsoft has embedded blockchain capabilities into its Azure Government Cloud. The company announced at the recently held Microsoft Cloud Forum that Azure Government will now support a wide array of blockchain and distributed ledger solutions, including Ethereum, Hyperledger, R3 Corda and Chain. In addition, Microsoft said it has a proof of concept for the Coco framework which could be made public early next year. The Coco Framework, which the Microsoft calls its “trusted execution environment designed to remove the latency” found in public blockchains, will serve as an open-source framework. Blockchain for Azure Government will help government agencies deal with issues such as the distribution of funds after natural disasters and the registration of property ownership. Analytics in News Teradata unveils Teradata Analytics Platform offering users preferred analytic environment Teradata has launched a new analytics system called Teradata Analytics Platform that embeds analytics close to data, and enables users throughout an organization to leverage their preferred analytic tools and languages, at scale, across multiple data types. “It’s making advanced analytics really accessible to a broad set of users, not just those with specialized skills,” said Imad Birouty, director of product marketing at Teradata, adding that the new platform will help “bring the data and analytics functions together so that they can be part of a company’s daily operation; repeatable, reusable, and extended out to a broad set of users.” Teradata Analytics Platform is part of the company’s Teradata Everywhere strategy that was launched last year with four key components: Deploy anywhere, Buy any way, Move any time, and Analyze anything. Cloudera speeds analytics deployment for next generation Cybersecurity Hub Cloudera has teamed up with Arcadia Data, Centrify, and StreamSets to simplify the first use case on the Cybersecurity hub. Leveraging Cloudera Manager’s parcel deployment capabilities, chief information security officers (CISOs) can now access Cloudera’s cybersecurity solution based on Apache Spot (incubating), through an app store-like experience, making machine learning simple and accessible by removing the barrier of entry to data-driven insights for security operation centers. The new service also provides easy access to associated ISV capabilities such as ingestion, visualisation, and analytics. “Together with our partners, Cloudera is providing CISOs with a point and click path to deploy and benefit from a next generation cybersecurity data platform,” Cloudera CEO Tom Reilly said. Other Data Science News ErosCoin kicks off ICO to fund R&D and business expenses Blockchain-based payment gateway solution EROSCOIN is launching an ICO for ERO tokens, and are accepting BTC, ETH, and LTC as means of contribution. Out of the total supply of 2.4 billion ERO tokens, 1.2 billion coins are available for the ICO. Half of the funds raised will go towards  research and core development and the other half will get divided between other business expenses like marketing, legal and operational work as well as bounty programs. EROS foundation, which will ultimately be responsible for the development of EROSCOIN platform, is currently slated to receive 20% of the total coin supply. The remainder of all issued tokens will then be distributed to advisory and escrow (9%), a reserve fund (10%), charity (10%) and bounties (1%). The first phase of the EROS ICO will offer a 25% bonus to investors who contribute on the 1st and 2nd day of the sale, with bonuses then decreasing by roughly 5% on a weekly basis. The sale will run over a month-long period.
Read more
  • 0
  • 0
  • 1370

article-image-trending-datascience-news-23rd-oct-17-headlines
Packt Editorial Staff
23 Oct 2017
3 min read
Save for later

Google’s first smartphone chip, Shutterstock’s Composition Aware Search feature, and more - 23rd Oct.' 17 Headlines

Packt Editorial Staff
23 Oct 2017
3 min read
Google's chip Pixel Visual Core in News Pixel Visual Core: Google’s first custom smartphone chip There is a special component inside Google’s flagship smartphones Pixel 2 and Pixel 2 XL that was not really announced during the Oct. 4 launch event. Google has now said it put a custom, self-designed chip inside Pixel 2 for image processing called Pixel Visual Core. A couple of months back there were rumors that Google could dabble into chip design and that a key Apple veteran had been hired to guide the architecture. Pixel Visual Core reportedly has eight Image Processing Units (IPUs), where each IPU core is packed with 512 arithmetic logic units capable of running 3 trillion operations per second. The Pixel Visual Core is expected to be activated in a future software update for Pixel 2 users, once developers have been able to write apps for it. AI in News Shutterstock’s Composition Aware Search feature uses deep learning to refine image search Stock photo service Shutterstock has launched a new feature that will allow users to search images based on their compositions and layouts. The photo search tool, called Composition Aware Search, uses advanced deep learning technology to find the right images excluding the thousands of irrelevant images in the traditional image search. The feature is still in beta and its patent is pending. ThinkSCM announces AI-enabled predictive analytics tool to boost supply chain At the recently held APICS 2017 supply chain conference, ThinkSCM unveiled a prescriptive analytics tool that uses artificial intelligence for data analysis and future predictions and recommendations. The company said they had developed the algorithm to bridge a gap in SAP software for a client, but after the successful outcome, ThinkSCM decided to launch it commercially. With McAfee Investigator and McAfee Cloud Workload Security, McAfee uses AI to boost enterprise security McAfee has introduced several machine learning, deep learning, and artificial intelligence features into its enterprise security offerings and make use of automation, reasoning and data curation provided by analytics technologies. Apart from new innovations that can decrypt ransomware and steganography detection, the company has launched two new solutions: McAfee Investigator and McAfee Cloud Workload Security. While McAfee Investigator uses advanced analytics for accurate threat prioritization, McAfee Cloud Workload Security addresses challenges such as visibility across hybrid cloud workloads and enterprises service architecture. Other Data Science News Spring Data Neo4j 5.0 release brings smarter querying for better performance After Spring Data Neo4j 4 which was a total rewrite from earlier versions, Spring Data Neo4j 5 has been released as another major version that brings several new functionalities. Built upon Neo4j OGM 3.0, SDN 5 adds dynamic properties and schema-based loading which eventually corrects the problems with SDN 4 where more data were often loaded than required. SDN 5 is now using a new load strategy based on a schema derived automatically from class metadata. It uses nested pattern comprehensions generated from the schema, and now only the data which will be mapped are fetched. Another change that makes the mapping easier is that entity fields are now written directly by the object mapper, not through annotated or derived setters. In addition to added support for query and projections methods, Spring Data Neo4j 5.0 carries the latest enhancements in the Spring world as it is built on the foundations of Java 8, the new Spring Framework 5.0 and Spring Data 2.0. A detailed documentation for the migration from SDN 4.2 to the new version has been released with guidelines.  
Read more
  • 0
  • 0
  • 1331

article-image-trending-datascience-news-18th-oct-17-headlines
Packt Editorial Staff
18 Oct 2017
2 min read
Save for later

Intel takes Facebook’s help on AI chip; Cisco uses AI to predict IT services; and more - 18th Oct.' 17 Headlines

Packt Editorial Staff
18 Oct 2017
2 min read
Intel’s AI chip in news Intel collaborates with Facebook on its upcoming artificial intelligence chip NNP Intel said it is now working with Facebook on its much anticipated artificial intelligence chip which will be shipped by the end of this year. Intel, the chip giant, is making an ambitious debut into the field of artificial intelligence with its upcoming Nervana Neural Network Processor (NNP). “We are thrilled to have Facebook in close collaboration sharing its technical insights as we bring this new generation of AI hardware to market,” CEO Brian Krzanich wrote. The Intel Nervana Neural Network Processor is named as such because Intel acquired the chip startup Nervana Systems in 2016. New AI services in news Cisco using new AI services to predict IT failures Cisco has launched two new AI-enabled services – Business Critical Services and High-Value Services – to apply machine learning and artificial intelligence in helping businesses negotiate IT risks and predict failures. Cisco Business Critical Services will help predict opportunities by applying actionable analytics, automation and technology expertise, whereas the Cisco High-Value Services will enhance the overall utilization of advanced software, solutions and the network. SAP uses machine learning to optimize online shopping SAP has released several machine learning, facial recognition, and Internet of Things features to improve e-commerce experience with targeted marketing campaigns. With the updated SAP Hybris Marketing Cloud solution, companies can use the right messages to target the right customers. SAP said the personalized offers will ensure data protection and privacy. Other Data Science News Tableau announces its Hyper engine is now in beta release 10.5 Tableau’s much anticipated in-memory data processing engine “Hyper” is now in beta release 10.5 and will be generally available early next year, the company announced at its recent conference at Las Vegas. The Hyper engine was described at Tableau’s last year conference and it claims to to solve performance problems when handling large-scale structured data extracts. In future, Hyper will be enhanced to address NoSQL and graph workloads. IBM SPSS Modeler to teach data science, machine learning for free IBM SPSS Modeler is now available for free, as part of IBM’s Academic Initiative program where it provides many software and cloud services for free or reduced cost. Students and professors can go to ibm.onthehub.com and search for IBM SPSS Modeler. They will require to get a license key valid for one year.
Read more
  • 0
  • 0
  • 11990

article-image-trending-datascience-news-17th-oct-17-headlines
Packt Editorial Staff
17 Oct 2017
4 min read
Save for later

Google’s AutoML beats human AI capacities and more - 17th Oct.' 17 Data science news headlines

Packt Editorial Staff
17 Oct 2017
4 min read
Google AutoML in News AutoML has started creating better AIs than researchers, Google says Google’s AutoML project has started replicating itself, and the AI software is producing machine learning codes with higher efficiency rate than researchers. AutoML was launched this year at Google’s annual developer conference in May with an aim to make machines ‘intelligent’ enough to create other intelligent machines. Now the project is yielding great success as AutoML has started building ML software that are more powerful than human-designed AI systems, even in complicated tasks related to augmented reality and automation. CEO Sundar Pichai said Google could it plans to ‘democratize’ AutoML in future making it available outside Google. AI platforms in News Mitchell's WorkCenter™ Assisted Review is P&C industry's first AI-driven Claim Review Solution Mitchell, leading provider of technology, connectivity and information solutions to the Property & Casualty (P&C) claims and Collision Repair industries, has launched an integrated workflow solution to leverage artificial intelligence for the estimate review process. Named Mitchell WorkCenter™ Assisted Review, the solution uses machine-learning technology to help identify incorrect replace or repair decisions, helping insurance companies review more estimates in less time while refining estimating guidelines and consistency. Early pilot tests demonstrated that A.I.-identified claims consistently reduced the amount of time for the audit and review function per claim by a substantial margin. IZEA uses artificial intelligence on its content with ContentMine IZEA has introduced a new feature into its IZEAx platform named ContentMine that automatically mines content and tags photo and video assets using artificial intelligence. Apart from text and image processing, ContentMine includes smart groups, content ratings, and several search filters, and can programmatically grab screen shots of published social media content. “ContentMine serves as an intelligent repository for all the content generated through IZEA campaigns, and allows marketers to upload content produced outside our platform as well,” said Ted Murphy, Founder and CEO of IZEA, “If an Instagram picture taken by an influencer contains a dog and a car, ContentMine will programmatically identify those objects and make them searchable. Marketers can use the content analysis engine built into ContentMine to reduce the time historically spent tagging and manually organizing content.” Other trending data science news IBM’s new services make cloud migration easier, faster and more affordable   IBM Cloud Migration Services and IBM Cloud Deployment Services are two new services launched by IBM that provide less expensive and faster ways to move business data and applications to the cloud. With Cloud Migration Services, businesses can understand their existing IT infrastructure and accordingly work to migrate services to the cloud. Whereas Cloud Deployment Services is a next-gen automation platform for building private and hybrid clouds across multiple platforms and service providers. Overall, the new services claim to drastically reduce the design, build, deployment and testing efforts. AMA initiates integrated big data analytics platform IHMI to organize health data In what could usher in a new era of patient care, the American Medical Association has announced that it is working on a big data analytics platform named Integrated Health Model Initiative (IHMI) to develop a common data model which could improve the way healthcare information is organized and shared. AMA is collaborating with Cerner Corporation, IBM, Intermountain Healthcare, PCORI, AMIA, and SNOMED, on this project. “We spend more than three trillion dollars a year on health care in America and generate more health data than ever before. Yet some of the most meaningful data – data to unlock potential improvements in patient outcomes – is fragmented, inaccessible or incomplete,” CEO of AMA James Madara noted. Razorthink Big Brain: An advanced deep learning platform that automates data science tasks Razorthink has launched a data science platform that automates the data preparation, modeling, evaluation and deployment of deep learning solutions. The automation platform is named Razorthink Big Brain, and it generates Expert AIs for customized business cases with superior predictive analytics. Created with hybrid algorithms that learns without human intervention, the Big Brain platform can discover insights using deep learning neural networks not possible otherwise with traditional machine learning algorithms.
Read more
  • 0
  • 0
  • 1435
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-trending-datascience-news-16th-oct-17-headlines
Packt Editorial Staff
16 Oct 2017
3 min read
Save for later

IBM’s new blockchain platform, Elasticsearch's availability on Alibaba Cloud, and more - 16th Oct.' 17 Headlines

Packt Editorial Staff
16 Oct 2017
3 min read
Blockchain in News IBM launches blockchain network for international banking across 12 currency corridors The use of digital currency received yet another boost for future as tech elephant IBM introduced its blockchain network for cross-border transactions. IBM has collaborated with KlickEx Group and Stellar.org on this platform, which the company described as a ‘paradigm shift’ underway to transfer money digitally all across the world at a real time. Currently, the network is handling transactions in a regulated environment across 12 currency corridors encompassing the Pacific Islands, Australia, New Zealand and the United Kingdom, but soon there could be an increase in scalability and volume. Jesse Lund, IBM’s VP of Blockchain, predicted a "drastic shift in the construct of payments infrastructure" within the next five years. Universa, Yellowrockets announce world’s first decentralized Blockchain accelerator Universa and Yellowrockets have reportedly launched world’s first decentralized Blockchain accelerator. Recruitment for the accelerator has started, founder of the Universa blockchain platform Alexander Borodich announced. YellowRockets team will organize and manage the accelerator programs for which applications have to be first submitted till 10th of November on www.urockets.com. After the projects are examined and shortlisted, there will be a PreCamp held at BAZAAR Tech Convention in Sochi from Nov. 16-19, and only after that the best blockchain startups will be selected for the acceleration program. It could be a good meeting point for blockchain start-ups, industry experts, and investors. Ethereum implements Blockchain Hard Fork to Byzantium Ethereum, the second largest cryptocurrency by market cap, has officially updated with the first half of the Metropolis hard fork, nicknamed Byzantium. The Byzantium upgrade is part of the Metropolis protocol designed to improve the blockchain by boosting network privacy and making it easier for decentralized applications (dapps) to proliferate on the platform. In 2015, Ethereum introduced a large-scale upgrade in its roadmap under the name Metropolis but the upgrade encountered substantial delays. As a result, Metropolis was broken into two phases – Byzantium and Constantinople. Other Data Science News Alibaba, Elastic collaborate to add Elasticsearch on Alibaba Cloud The Alibaba Group and Elastic have joined hands to offer Elasticsearch on the Alibaba Cloud platform. The new service is called Alibaba Cloud Elasticsearch. Now customers of Alibaba Cloud can deploy Elastic’s real-time search, data ingestion, and analytic features as a hosted and turnkey solution, according to an announcement during a keynote at The Computing Conference 2017. “Alibaba Cloud Elasticsearch will be a highly differentiated service as it uses Elastic’s advanced search product and powerful X-Pack features across every tier of our service in a way that is easy to get started, consume, and manage,” said Yeming Wang, Deputy General Manager, Alibaba Cloud Global. The product is available immediately and includes Elastic’s Kibana and X-Pack features.
Read more
  • 0
  • 0
  • 1621

article-image-12th-oct-17-headlines
Packt Editorial Staff
13 Oct 2017
3 min read
Save for later

Microsoft, AWS join forces for Gluon; Google, IBM unveil open API Grafeas; and more - 12th Oct.' 17 Headlines

Packt Editorial Staff
13 Oct 2017
3 min read
Microsoft, Amazon announce deep learning interface Gluon that is accessible to all developers Amazon Web Services and Microsoft have together developed a new open source deep learning library called Gluon that can be accessible to all developers. Using the interface, developers of all skill levels can build neural networks using simple, concise code, without sacrificing performance. Gluon will help developers build machine learning models using a simple Python API and a range of pre-built, optimized neural network components. “We created the Gluon interface so building neural networks and training models can be as easy as building an app,” said Swami Sivasubramanian, VP of Amazon AI. Gluon currently works with Apache MXNet and will support Microsoft Cognitive Toolkit (CNTK) in an upcoming release. Google, IBM launch open source API Grafeas for governing software supply chains as “central source of truth” IBM and Google have announced the launch of an open source initiative called Grafeas which offers developers a uniform way of auditing and governing their software supply chains. The software supply chain has several stages such as code, build, test, deploy and operate. At each stage, different tools generate metadata about various software components. Grafeas provides an open API that captures and aggregates this metadata. So using the API developers can easily track when and where the code was changed and who changed it, whether the code successfully passed security scan, and what type of vulnerabilities were found if it failed the test. As part of Grafeas, Google is also introducing Kritis which helps developers create Kubernetes governance policies based on the metadata stored in Grafeas. IBM said it will offer Grafeas and Kritis as part of the IBM Container Service on IBM Cloud. Grafeas and Kritis are Greek words which mean “scribe” and “judge” respectively. Other Data Science News Box announces Box Skills to manage growing multimedia content with artificial intelligence To manage the growing amount of multimedia content, Box has launched a new artificial intelligence toolkit called Box Skills. The company announced that its Box Skills framework will integrate the best AI and machine learning tools directly into the content on Box in a secured environment. Box Skills provides customers built-in flexibility to use algorithms from various companies, and mix and match intelligent machine learning tools from Google, IBM, and Microsoft. Breaking the announcement at its BoxWorks conference, Box previewed three specific skills it will initially offer in public beta: Audio intelligence, using technology from IBM Watson; video intelligence powered by Microsoft Cognitive Services; and image intelligence, using Google Cloud Platform. Box Skills will be available in beta in early 2018. Tensorflow receives hardware support from NVidia and Movidius TensorFlow now has new hardware support from NVidia and Movidius. TensorFlow will now run on NVidia’s Jetson TX2 and Intel’s Movidius chip. The Movidius Neural Compute Stick Software Development Kit (NC SDK) now supports TensorFlow. TensorRT 3 is part of the NVidia Deep Learning SDK; TensorRT includes the TensorRT Optimizer and runtime.  DIGITS 6 also supports TensorFlow (DIGITS expands as NVIDIA Deep Learning GPU Training System). Vora 2.0 released, SAP partners can now deploy Vora on multiple cloud systems SAP has launched a new edition of Vora, its big data analytics software. The new release, Vora 2.0, offers SAP partners multi-cloud deployment options. It uses container architecture and leverages open source Kubernetes platform for deployment, thus simplifying the overall deployment and cluster management on public cloud. The product is thus cloud-ready, and more hybrid-ready.
Read more
  • 0
  • 0
  • 2101

article-image-15-most-trending-applications-of-machine-learning-on-twitter
Aarthi Kumaraswamy
13 Oct 2017
2 min read
Save for later

Top 15 Applications of Machine Learning on Twitter

Aarthi Kumaraswamy
13 Oct 2017
2 min read
We know machine learning is being used to do a whole lot of things from spam filtering to self-driving cars. 15 applications of machine learning Following are ML applications that have become newsworthy on Twitter over a duration of one month. Our favorite is automatic multimedia tagging (duh! which editor wouldn’t like that!?). 1.  Sepsis (Hospital's silent killer) detection Using machine learning for real time understanding of patient safety risks via @maxwele2 2.  Multimedia Content Scaling Automatically tagging multimedia using AI would make searching for content so much easier. via @AnujRajbhandari 3. Making roads safe Using machine learning to make zebra crossings safer (caveat: as long as onus not on peds & cyclists to change) via @AlixKroeger 4. Managing Retail supply chain Manufacturing has been using #AI for some time, it is really now just starting to spread to retail supply chain. via @PaaSDev 5.  Tracking and surveillance for law enforcement Is part of the future of #journalism big-data analysis? EG using Machine Learning & data to uncover hidden stories? via @LeonLidigu 6.  Identifying cancers through gene study Using machine learning algorithms to identify genes essential for cell survival. via @dobebig 7. Designing animal product substitutes The Not Company (NotCo) is using machine learning to create vegetarian substitutes for animal products. Via @MarinaSpindler 8. Improving crop yield Phenotyping: Using Machine Learning for Improved Pairwise Genotype Classification Based on Root Traits. via @lukelliw 9. Rapidly discover new drug treatments for diseases Very cool approach to using machine learning to find new therapies. via @HighResBio 10. Engineering wood Materialize.X is using machine learning to disrupt the $300B engineered wood industry. via @prafiles 11. Profiling voters Using machine learning to profile German party voters. via @HalukaMB 12. Facial Recognition Apple is using machine learning for Face ID. via @VentureBeat 13. Managing e-commerce backend Amazon has big plans for using machine learning to improve their supply chain. via @boxtoninc 14.  Insurance pricing Using machine learning for insurance pricing optimization via @Ronald_vanLoon 15. Code free accounting Rod Drury on using machine learning and AI for code free accounting.  via @DamiaanvZ   What is your favorite use case for machine learning?
Read more
  • 0
  • 0
  • 6208

article-image-trending-datascience-news-11th-oct-17-headlines
Packt Editorial Staff
12 Oct 2017
3 min read
Save for later

Github's plan for coding automation, TensorFlow releases Tensorflow Lattice - 11th Oct.' 17 Headlines

Packt Editorial Staff
12 Oct 2017
3 min read
GitHub's plan for coding automation, TensorFlow Lattice release, and more in today’s data science news. GitHub in News GitHub will leverage its 10 years data to automate coding and offer project insights Introducing new features in what it called "just the start of a longterm roadmap" GitHub announced several automated coding features at its GitHub Universe conference this week. GitHub intends to leverage the data aggregated on its platform over the 10 years, and demonstrate how machine learning and data science can be applied to software development. The new tools will help developers track dependencies, keep code secure and discover new projects. Its new feature “dependency graph” provides developers insights into the projects, and suggests whether the software is up to date or still supported by a community, apart from giving detailed information on its license and security vulnerabilities. TensorFlow in News TensorFlow team releases TensorFlow Lattice TensorFlow team has announced the release of TensorFlow Lattice, which will ensure that your machine learning models follow the global trends, even when training data is noisy. The team said that TensorFlow Lattice is a library that implements Monotonic Calibrated Interpolated Look-Up Tables in TensorFlow. The library includes a collection of regularizations and monotonicity constraints configurable per feature. It has a set of TensorFlow estimators for regression and classification with the most common set ups for lattice models, and includes lattices and piecewise linear calibration as layers that can be composed into custom models. TensorFlow Lattice is not an official Google product. TensorFlow 1.4.0 released, several custom functions and bug fixes added TensorFlow 1.4.0-rc0 has been released, as per the official announcement on the TensorFlow twitter page. Among the new features, tf.data is now part of the core TensorFlow API and several other custom transformation functions have been added. The release also resolves and fixes bugs that required attention, such as the race condition in TensorForest TreePredictionsV4Op. In TensorFlow 1.4.0, Google Cloud Storage file system and Hadoop file system support are now default build options. Changes in the API include doing away with the seldom used and unnecessary functions. The API is now subject to backwards compatibility guarantees. In other Data Science News ViewLift adds AI on its platform to get insights on customer behavior, drive targeted retention strategies Leading content distribution platform ViewLift has integrated artificial intelligence engine technology into its platform services. ViewLift Intelligence, or VLI, will use the advanced machine learning and AI algorithms to leverage its data and offer enhanced customer behavioral insights and retention tools for operators. ViewLift Intelligence can analyze user viewing behavior, content preferences, subscription packages, acquisition method, and device preferences. It could accurately predict which paying subscribers are likely to cancel subscriptions in the near future. This could help in making targeted strategies to reduce the churn rate across multiple channels.
Read more
  • 0
  • 0
  • 12942
article-image-10th-oct-17-headlines
Packt Editorial Staff
11 Oct 2017
5 min read
Save for later

NVIDIA unveils supercomputer Pegasus, IBM integrates Data Science Experience - 10th Oct' 17 Headlines

Packt Editorial Staff
11 Oct 2017
5 min read
NVIDIA says its supercomputer Pegasus will drive fully autonomous robotaxis In what could truly make self-driving cars a reality, NVIDIA has designed world's first AI computer codenamed “Pegasus” that is capable of handling Level 5 driving without requiring steering wheels, pedals, or mirrors. It will instead consist of sensors, cameras, radars and lidars to facilitate driving fully autonomous robotaxis. The advanced computing system NVIDIA Drive PX Pegasus is extending the capabilities of its predecessor NVIDIA Drive PX 2 by more than 10 times in terms of the processing power and performance. "Driverless cars will enable new ride- and car-sharing services. New types of cars will be invented, resembling offices, living rooms or hotel rooms on wheels. Travelers will simply order up the type of vehicle they want based on their destination and activities planned along the way. The future of society will be reshaped," NVIDIA founder and CEO Jensen Huang said. There are hundreds of tech companies who are striving to bring autonomous self-driving cars on the road, and Pegasus will be marketed to them from the second half of 2018, the company said in its announcement. Shares of NVIDIA hit a record high following the news. IBM in News IBM advances analytics by integrating PowerAI and Data Science Experience IBM is bringing their two key data science tools, Data Science Experience and IBM PowerAI, together. The company said in an announcement that the integration is intended to provide machine learning and deep learning on a single machine. The Data Science Experience gives users collaboration tools for managing and monitoring data models, according to Dinesh Nirmal, IBM’s vice president of analytics development. PowerAIi, meanwhile, brings in GPUs as well as deep learning libraries and algorithms that can be used on multiple frameworks, such as TensorFlow, he said. With this significant integration, users can create and train intelligence-led models using the deep learning frameworks to gain expanded data insights. Nirmal said that while 80% of enterprise problems can be solved with machine learning, there are specific use cases where deep learning is more effective. “If you’re running a huge neural network, that complexity requires deep learning. Or if you’re FedEx, to know what happened to a damaged box and how it got damaged, you would use deep learning. Anything that is data and process intensive,” he noted. Others in Data Science News Sage launches Sage Business Cloud to provide unified set of business solutions Sage, the leading provider of cloud business management solutions, has unveiled Sage Business Cloud. The platform offers a powerful set of core products and add-on applications as a complete solution that meets unique business needs. The company claimed that Sage Business Cloud could be the “only cloud platform that businesses will ever need” and that it could also use the latest advancements in AI and machine learning to further help businesses improve productivity and efficiency. "Sage Business Cloud is the next transformative wave of business software. As the fourth industrial revolution continues to take hold, we want to make our customers lives simple. Businesses of all shapes and sizes need products that aid productivity, enable them to respond at lightning speed and deliver insights as well as opportunity,” Sage CEO Stephen Kelly said. Puppet partners Google to offer customers cloud platform modules supporting migration and management Puppet has entered into a collaboration with Google Cloud which could offer its customers Google Cloud Platform (GCP) services, including its advanced machine learning and data analytics capabilities. The partnership may also help slash their IT costs. Puppet is known for its automated approach to delivery and operations of the software, and now its customers can avail the Google Cloud’s flexibility and agility as well. According to the joint announcement, Google Cloud will also release the technology they used to generate modules so that the Puppet module ecosystem could move faster, keeping up with rapidly changing APIs in the cloud. "Our customers want choice, flexibility and the ability to manage everything they have, from their physical infrastructure to cloud resources for maximum operational efficiency and scale," said Nigel Kersten, Chief Technical Strategist at Puppet, “With Google Cloud's expertise in providing world class infrastructure and Puppet's widely adopted enterprise management platform, we're helping customers accelerate their move to the cloud." NICE accelerates machine learning capabilities in next evolution of cognitive process automation NICE has announced the next evolution in its cognitive automation platform – an integration with technology partner Celaton to infuse NICE Robotic Automation with enhanced machine learning capabilities. This integration slashes manual effort by as much as 85% across some of the most complex business processes, and reduces process time by almost 95%. With cognitive machine learning capabilities, complex data is quickly consumed and interpreted, and sound judgments made by robots, who are instructed to respond to customer queries or complaints in an intelligent and highly personalized manner. “Robotic Process Automation has already made great strides globally by significantly impacting business efficiencies and ROI. We have now entered a new era of cognitive automation, and we are delighted to be at the forefront of innovation as we boldly expand our machine learning capabilities,” Miki Migdal, president of the NICE Enterprise Product Group said, “The integration with Celaton not only addresses many of the more complex and challenging business problems facing our customers today, but also marks a significant contribution to the cognitive automation arena.”
Read more
  • 0
  • 0
  • 1698

article-image-trending-datascience-news-9th-oct-17-headlines
Packt Editorial Staff
10 Oct 2017
3 min read
Save for later

Uber open sources AthenaX, Cortana says ‘hi’ on Skype, and more - 9th Oct' 17 - Headlines

Packt Editorial Staff
10 Oct 2017
3 min read
Uber unveils open source streaming analytics platform AthenaX To serve users better with actionable insights, Uber has built an SQL-based streaming analytics platform named AthenaX. The in-house platform was open sourced on GitHub. With the increase in growth of its business, Uber required an infrastructure that could analyze real-time events and was easy to navigate. “AthenaX empowers our users, both technical and non-technical, to run comprehensive, production-quality streaming analytics using Structured Query Language (SQL), the company said in its announcement, “Our real-world experience shows that AthenaX enables users to bring large-scale streaming analytic workloads in production within a matter of hours compared to weeks.” Qlik’s “Visualize Your World” Data Analytics 2017 Tour kicks off Qlik has commenced its annual “Visualize Your World” data analytics global tour, being held in 27 cities from different parts of the world. Over 15,000 registrants may attend the event this year across the Asia Pacific, Middle East, Europe, Africa, and Americas. “Following a tremendously successful 2016 Tour, we are excited to once again host these events to connect with people in the region who are passionate in learning more about the biggest technology trends in the data analytics space and how blending machine learning with human intuition and creativity creates a multiplier effect for their businesses,” said Julian Quinn, Vice President at Qlik for APAC regions. Qlik will unveil some of its latest innovations at the event. Registration is free. MicroStrategy 10.9 introduces Dossiers that could “deliver analytics for everyone” MicroStrategy Inc. has announced general availability of MicroStrategy 10.9, the newest feature release which introduces Dossier, a new storybook experience around analytics. Dossier is an interactive, streamlined interface that presents relevant data analytics in chapters and pages in a format everyone can understand. "MicroStrategy 10.9 represents the biggest leap forward since our MicroStrategy 10 platform launch and underscores our vision of delivering ‘Intelligence Everywhere'," said Tim Lang, senior executive vice president and chief technology officer at MicroStrategy, “We believe collaborative analytics accelerates the velocity of decision making. That's why we're introducing Dossier, an easier and faster method of consuming analytics that we believe end users are going to love. MicroStrategy 10.9 empowers users to do more with their analytics regardless of their technical skill or role."  Other Data Science News Python package pomegranate releases latest version 0.8.0 A new version of pomegranate, a python package for probabilistic modeling, has been released. In pomegranate v0.8.0, there are several new functionalities such as built-in out-of-core learning, bulit-in parallelism, minibatch learning, and semi-supervised learning. Also, multivariate gaussian distributions can now use a GPU through the CuPy package, pomegranate developer Jacob Schreiber said in an announcement, adding that this has speeded up the operations around 4x on test runs. The pomegranate v0.8.0 is still not compatible with networkx v2.0, and users may need to downgrade networkx to use pomegranate. A very detailed documentation has been released for pomegranate v0.8.0, including FAQ for each section. Microsoft introduces AI assistant Cortana into Skype Microsoft has added the AI assistant Cortana into Skype. Now every Skype user will see Cortana in their contact list which can be used for either one-on-one chats answering queries with suggested replies, or for conversations involving scheduling of events, searching for nearby restaurants, or sharing IMDB movie reviews. The gradual roll out of Cortana on Skype has kickstarted for iOS and Android users in the U.S., Microsoft announced, while adding that the feature currently does not work in voice or video calls.  
Read more
  • 0
  • 0
  • 1766

article-image-real-time-stream-processing
Packt Editorial Staff
06 Oct 2017
10 min read
Save for later

Stream me up, Scotty!

Packt Editorial Staff
06 Oct 2017
10 min read
[box type="note" align="aligncenter" class="" width=""]The following is an excerpt from the book Scala and Spark for Big Data Analytics, Chapter 9, Stream me up, Scotty - Spark Streaming written by Md. Rezaul Karim and Sridhar Alla. It explores the big three stream processing paradigms that are in use today. [/box] In today's world of interconnected devices and services, it is hard to spend even a few hours a day without our smartphone to check Facebook, or hail an Uber ride, or tweet something about the burger we just bought, or check the latest news or sports updates on our favorite team. We depend on our phones and Internet, for a lot of things, whether it is to get work done, or just browse, or e-mail a friend. There is simply no way around this phenomenon, and the number and variety of applications and services will only grow over time. As a result, the smart devices are everywhere, and they generate a lot of data all the time. This phenomenon, also broadly referred to as the Internet of Things, has changed the dynamics of data processing forever. Whenever you use any of the services or apps on your iPhone, or Droid or Windows phone, in some shape or form, real-time data processing is at work. Since so much depends on the quality and value of the apps, there is a lot of emphasis on how the various startups and established companies are tackling the complex challenges of SLAs (Service Level Agreements), and usefulness and also the timeliness of the data. One of the paradigms being researched and adopted by organisations and service providers is the building of very scalable, near real-time or real-time processing frameworks on  cutting-edge platforms or infrastructure. Everything must be fast and also reactive to changes and failures. You won’t like it if your Facebook updated once every hour or if you received email only once a day; so, it is imperative that data flow, processing, and the usage are all as close to real time as possible. Many of the systems we are interested in monitoring or implementing, generate a lot of data as an indefinite continuous stream of events. As in any data processing system, we have the same fundamental challenges of data collection, storage, and data processing. However, the additional complexity is due to the real-time needs of the platform. In order to collect such indefinite streams of events and then subsequently process all such events to generate actionable insights, we need to use highly scalable specialized architectures to deal with tremendous rates of events. As such, many systems have been built over the decades starting from AMQ, RabbitMQ, Storm, Kafka, Spark, Flink, Gearpump, Apex, and so on. Modern systems built to deal with such large amounts of streaming data come with very flexible and scalable technologies that are not only very efficient but also help realize the business goals much better than before. Using such technologies, it is possible to consume data from a variety of data sources and then use it in a variety of use cases almost immediately or at a later time as needed. Let us talk about what happens when you book an Uber ride on your smartphone to go to the airport. With a few touches on the smartphone screen, you're able to select a point, choose the credit card, make the payment, and book the ride. Once you're done with your transaction, you then get to monitor the progress of your car real-time on a map on your phone. As the car is making its way toward you, you're able to monitor exactly where the car is and you can also make a decision to pick up coffee at the local Starbucks while you're waiting for the car to pick you up. You could also make informed decisions regarding the car and the subsequent trip to the airport by looking at the expected time of arrival of the car. If it looks like the car is going to take quite a bit of time picking you up, and if this poses a risk to the flight you are about to catch, you could cancel the ride and hop in a taxi that just happens to be nearby. Alternatively, if it so happens that the traffic situation is not going to let you reach the airport on time, thus posing a risk to the flight you are due to catch, you also get to make a decision regarding rescheduling or canceling your flight. Now in order to understand how such real-time streaming architectures such as Uber’s Apollo work to provide such invaluable information, we need to understand the basic tenets of streaming architectures. On the one hand, it is very important for a real-time streaming architecture to be able to consume extreme amounts of data at very high rates while, on the other hand, also ensuring reasonable guarantees that the data that is getting ingested is also processed. The following diagram shows a generic stream processing system with a producer putting events into a messaging system while a consumer is reading from the messaging system. Processing of real-time streaming data can be categorized into the following three essential paradigms: At least once processing At most once processing Exactly once processing Let's look at what these three stream processing paradigms mean to our business use cases. While exactly once processing of real-time events is the ultimate nirvana for us, it is very difficult to always achieve this goal in different scenarios. We have to compromise on the property of exactly once processing in cases where the benefit of such a guarantee is outweighed by the complexity of the implementation. Stream Processing Paradigm 1: At least once processing The at least once processing paradigm involves a mechanism to save the position of the last event received only after the event is actually processed and results persisted somewhere so that, if there is a failure and the consumer restarts, the consumer will read the old events again and process them. However, since there is no guarantee that the received events were not processed at all or partially processed, this causes a potential duplication of events as they are fetched again. This results in the behavior that events get processed at least once. At least once is ideally suitable for any application that involves updating some instantaneous ticker or gauge to show current values. Any cumulative sum, counter, or dependency on the accuracy of aggregations (sum, groupBy, and so on) does not fit the use case for such processing simply because duplicate events will cause incorrect results. The sequence of operations for the consumer are as follows: Save results Save offsets Below is an illustration of what happens if there is a failure and consumer restarts. Since the events have already been processed but the offsets have not been saved, the consumer will read from the previous offsets saved, thus causing duplicates. Event 0 is processed twice in the following figure: Stream Processing Paradigm 2: At most once processing The at-most-once processing paradigm involves a mechanism to save the position of the last event received before the event is actually processed and results persisted somewhere so that, if there is a failure and the consumer restarts, the consumer will not try to read the old events again. However, since there is no guarantee that the received events were all processed, this causes potential loss of events as they are never fetched again. This results in the behavior that the events are processed at most once or not processed at all. At most once is ideally suitable for any application that involves updating some instantaneous ticker or gauge to show current values, as well as any cumulative sum, counter, or other aggregation, provided accuracy is not mandatory or the application needs absolutely all events. Any events lost will cause incorrect results or missing results. The sequence of operations for the consumer are as follows: Save offsets Save results Below is an illustration of what happens if there are a failure and the consumer restarts. Since the events have not been processed but offsets are saved, the consumer will read from the saved offsets, causing a gap in events consumed. Event 0 is never processed in the following figure: Stream Processing Paradigm 3: Exactly once processing The Exactly once processing paradigm is similar to the at least once paradigm, and involves a mechanism to save the position of the last event received only after the event has actually been processed and the results persisted somewhere so that, if there is a failure and the consumer restarts, the consumer will read the old events again and process them. However, since there is no guarantee that the received events were not processed at all or were partially processed, this causes a potential duplication of events as they are fetched again. However, unlike the at least once paradigm, the duplicate events are not processed and are dropped, thus resulting in the exactly once paradigm. Exactly once processing paradigm is suitable for any application that involves accurate counters, aggregations, or which, in general, needs every event processed only once and also definitely once (without loss). The sequence of operations for the consumer are as follows: Save results Save offsets The following is illustration shows what happens if there are a failure and the consumer restarts. Since the events have already been processed but offsets have not saved, the consumer will read from the previous offsets saved, thus causing duplicates. Event 0 is processed only once in the following figure because the consumer drops the duplicate event 0: How does the exactly once paradigm drop duplicates? There are two techniques which can help here: Idempotent updates Transactional updates Idempotent updates involve saving results based on some unique ID/key generated so that, if there is a duplicate, the generated unique ID/key will already be in the results (for instance, a database) so that the consumer can drop the duplicate without updating the results. This is complicated as it's not always possible or easy to generate unique keys. It also requires additional processing on the consumer end. Another point is that the database can be separate for results and offsets. Transactional updates save results in batches that have a transaction beginning and a transaction commit phase within so that, when the commit occurs, we know that the events were processed successfully. Hence, when duplicate events are received, they can be dropped without updating results. This technique is even more complicated than the idempotent updates as now we need some transactional data store. Another point is that the database must be the same for results and offsets. You should look into the use case you're trying to build and see if ‘at least once processing’, or ‘at most once processing’, can be reasonably wide and still achieve an acceptable level of performance and accuracy. If you enjoyed this excerpt, be sure to check out the book Scala and Spark for Big Data Analytics it appears in. You will also like this exclusive interview on why Spark is ideal for stream processing with Romeo Kienzler, Chief Data Scientist in the IBM Watson IoT worldwide team and author of Mastering Apache Spark, 2nd Edition.
Read more
  • 0
  • 0
  • 16723
article-image-trending-datascience-news-5th-oct-17-headlines
Packt Editorial Staff
06 Oct 2017
3 min read
Save for later

Google Compute Engine memory levels raised, Edwards merge into Tensorflow and more - 5th Oct' 17 Headlines

Packt Editorial Staff
06 Oct 2017
3 min read
Google Compute Engine launched with up to 96 CPU cores and 624 GB of memory Google Compute Engine has announced an offering with 64 CPU cores and 416 GB of memory. Google has thus doubled the memory Compute Engine previously offered with 32 cores. “These machine types run on Intel Xeon Scalable processors (codenamed Skylake), and offer the most vCPUs of any cloud provider on that chipset. Skylake in turn provides up to 20% faster compute performance, 82% faster HPC performance, and almost 2X the memory bandwidth compared with the previous generation Xeon,” Google said in it announcement. Users can also adjust their workload requirements with custom CPU and memory configurations in case they don’t require that much power. In future, Google is even considering products that deliver up to 4TB of memory. Edward merge into Tensorflow Edward, a Python library for probabilistic modeling, inference and criticism, has announced its official merger into TensorFlow. Dustin Tran, who leads the development of Edward, announced that for now Edward will be in the contrib module to avoid redundancy with other submodules. “We’re not sure if all of Edward’s features will be in TensorFlow just yet: for example, it’s unclear where to put Edward’s precise PPL. That said, expect that in this move many new innovations in Edward’s design will appear as we make programmable inference far more flexible, more generally compatible with hardware and distributed choices, and most importantly, more accessible by researchers and applied MLers alike,” Dustin said in the official announcement. Other Data Science News PostgreSQL 10 released: logical replication, declarative table partitioning key features PostgreSQL Global Development Group has announced the release of PostgreSQL 10. The latest version includes several additions that were long anticipated such as native logical replication, declarative table partitioning, and improved query parallelism. "Our developer community focused on building features that would take advantage of modern infrastructure setups for distributing workloads," said Magnus Hagander, a core team member of the PostgreSQL Global Development Group. The versioning for PostgreSQL has henceforth been revised to "x.y" format, meaning the next minor release will be 10.1 and next major release will be 11. Microsoft Azure Functions adds support for Java Microsoft’s Azure Functions serverless computing platform will now support Java, the company announced at the JavaOne conference in San Francisco. Azure Functions has so far supported C#, JavaScript, F#, PHP, Python, Bash, Batch and PowerShell, and now the service intends to tap the large developer base of Java. To use Azure Functions, Java developers will not have to learn any new tools. Microsoft is coming up with a Maven plugin using which developers can write and deploy the Maven-enabled apps directly to Azure Functions. PyPy v5.9 released with added support for Pandas and NumPy PyPy has announced the release of its version 5.9, and it now supports Pandas and NumPy too. PyPy 5.8 was released earlier this year in June, where the growing community of PyPy users had reported cases of bugs and other issues. The latest version has several incremental improvements, and the PyPy team has advised that its users go for an update to resolve several ongoing performance issues. According to the announcement, PyPy has released both PyPy3.5 v5.9 (a beta-quality interpreter for Python 3.5 syntax) and PyPy2.7 v5.9 (an interpreter supporting Python 2.7 syntax). NumPy and Pandas now work on PyPy2.7 (together with Cython 0.27.1). CFFI, which has been updated to 1.11.1, now supports complex arguments in API mode, as well as char16_t and char32_t and has improved support for callbacks.
Read more
  • 0
  • 0
  • 1705

article-image-trending-datascience-news-4th-oct-17-headlines
Packt Editorial Staff
05 Oct 2017
3 min read
Save for later

Dragonchain ICO, DeepMind’s ethical compass, Google’s Teachable Machine and more - 4th Oct 17 Headlines

Packt Editorial Staff
05 Oct 2017
3 min read
DeepMind sets up a separate unit for ethical regulations around AI Google-owned DeepMind has set up DeepMind Ethics & Society (DMES), a unit for societal and ethical impact of artificial intelligence. Tech consultant Sean Legassick will lead the unit, along with former Google policy manager and government adviser Verity Harding. DMES is expected to focus on six areas: privacy transparency and fairness, economic impacts, governance and accountability, managing AI risk, AI morality and values, and how AI can address global challenges. The announcement for DMES comes a year after a group of leading technology firms, activists, and academic organizations came together to set up the Partnership on AI, pressing for best practices in AI developments. DMES is also separate from DeepMind’s secretive internal ethics and safety board, which has been functioning since 2010 when Google acquired DeepMind. Google’s Teachable Machine lets users train a web app to respond in a certain way to certain actions Google has introduced a new AI feature named “Teachable Machine” using which users can set a pattern and ask the app to identify it using webcam. The platform promises to teach machine learning concepts in the easiest way possible – all you have to do is to turn on your webcam, train the robot brain what a certain action and motion looks like, and then tell it what kind of reactions you would like. The machine uses Java-based deeplearn.js framework, and works well with most browsers. Google introduces improved version of WaveNet on Assistant Google has announced that it is now using an updated and better version of WaveNet on Google Assistant. WaveNet was unveiled by Google a year back, as a deep neural network for generating raw audio waveforms that could produce better and more realistic speeches. Then at the time of its launch the platform had been called too computationally intensive to deploy in the real world. This had led the developers to work hard over the last 12 months to improve the speed and overall quality. At the moment WaveNet is only available for English (U.S.) and Japanese, Google said, adding that the new version is the first product to launch on its latest TPU cloud infrastructure. Other Data Science News LinkedIn announces new data analytics tool named Talent Insights LinkedIn has launched Talent Insights, a paid big data analytics product that may answer questions like what schools are producing the most successful data scientists or which companies have been hiring the most Python developers; or what skills are growing the fastest in the industry. Aimed at smarter hiring, Talent Insights will come initially with two views: “Talent Pool” and “Company report.” The product was announced in a closed beta at LinkedIn’s  Talent Connect event, and is expected to be generally available in 2018. Disney’s original blockchain platform Dragonchain kicks off Initial Coin Offering (ICO) Dragonchain, the blockchain platform originally developed at Disney and now managed by the Dragonchain Foundation, launched its public Initial Coin Offering (ICO) on 2nd October, marking the one-year anniversary of Disney releasing it as open source. Tokens will be issued until 2nd November, and proceeds received will be used in providing access to Dragonchain platform services, project incubation, and professional services to support enterprises, start-ups, and entrepreneurs building applications on the platform. "Our vision for Dragonchain is a secure and flexible blockchain platform paired with a crowd scaled incubator," said Joe Roets, Founder and CEO of Dragonchain Inc. "The system is modeled to create feedback loops and accelerate blockchain projects and market success."
Read more
  • 0
  • 0
  • 1353
Modal Close icon
Modal Close icon