Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

Tech News

3711 Articles
article-image-dr-brandon-explains-word-vectors-word2vec-jon
Aarthi Kumaraswamy
01 Nov 2017
6 min read
Save for later

Dr. Brandon explains Word Vectors (word2vec) to Jon

Aarthi Kumaraswamy
01 Nov 2017
6 min read
[box type="shadow" align="" class="" width=""]Dr. Brandon: Welcome back to the second episode of 'Date with Data Science'. Last time, we explored natural language processing. Today we talk about one of the most used approaches in NLP: Word Vectors. Jon: Hold on Brandon, when we went over maths 101, didn't you say numbers become vectors when they have a weight and direction attached to them. But numbers and words are Apples and Oranges! I don't understand how words could also become vectors. Unless the words are coming from my movie director and he is yelling at me :) ... What would the point of words having directions be, anyway? Dr. Brandon: Excellent question to kick off today's topic, Jon. On an unrelated note, I am sure your director has his reasons. The following is an excerpt from the book Mastering Machine Learning with Spark 2.x by Alex Tellez, Max Pumperla and Michal Malohlava. [/box] Traditional NLP approaches rely on converting individual words--which we created via tokenization--into a format that a computer algorithm can learn (that is, predicting the movie sentiment). Doing this required us to convert a single review of N tokens into a fixed representation by creating a TF-IDF matrix. In doing so, we did two important things behind the scenes: Individual words were assigned an integer ID (for example, a hash). For example, the word friend might be assigned to 39,584, while the word bestie might be assigned to 99,928,472. Cognitively, we know that friend is very similar to bestie; however, any notion of similarity is lost by converting these tokens into integer IDs. By converting each token into an integer ID, we consequently lose the context with which the token was used. This is important because, in order to understand the cognitive meaning of words, and thereby train a computer to learn that friend and bestie are similar, we need to understand how the two tokens are used (for example, their respective contexts). Given this limited functionality of traditional NLP techniques with respect to encoding the semantic and syntactic meaning of words, Tomas Mikolov and other researchers explored methods that employ neural networks to better encode the meaning of words as a vector of N numbers (for example, vector bestie = [0.574, 0.821, 0.756, ... , 0.156]). When calculated properly, we will discover that the vectors for bestie and friend are close in space, whereby closeness is defined as a cosine similarity. It turns out that these vector representations (often referred to as word embeddings) give us the ability to capture a richer understanding of text. Interestingly, using word embeddings also gives us the ability to learn the same semantics across multiple languages despite differences in the written form (for example, Japanese and English). For example, the Japanese word for movie is eiga; therefore, it follows that using word vectors, these two words, should be close in the vector space despite their differences in appearance. Thus, the word embeddings allow for applications to be language-agnostic--yet another reason why this technology is hugely popular! Word2vec explained First things first: word2vec does not represent a single algorithm but rather a family of algorithms that attempt to encode the semantic and syntactic meaning of words as a vector of N numbers (hence, word-to-vector = word2vec). We will explore each of these algorithms in depth in this chapter, while also giving you the opportunity to read/research other areas of vectorization of text, which you may find helpful. What is a word vector? In its simplest form, a word vector is merely a one-hot-encoding, whereby every element in the vector represents a word in our vocabulary, and the given word is encoded with 1 while all the other words elements are encoded with 0. Suppose our vocabulary only has the following movie terms: Popcorn, Candy, Soda, Tickets, and Blockbuster. Following the logic we just explained, we could encode the term Tickets as follows: Using this simplistic form of encoding, which is what we do when we create a bag-of-words matrix, there is no meaningful comparison we can make between words (for example, is Popcorn related to Soda; is Candy similar to Tickets?). Given these obvious limitations, word2vec attempts to remedy this via distributed representations for words. Suppose that for each word, we have a distributed vector of, say, 300 numbers that represent a single word, whereby each word in our vocabulary is also represented by a distribution of weights across those 300 elements. Now, our picture would drastically change to look something like this: Now, given this distributed representation of individual words as 300 numeric values, we can make meaningful comparisons among words using a cosine similarity, for example. That is, using the vectors for Tickets and Soda, we can determine that the two terms are not related, given their vector representations and their cosine similarity to one another. And that's not all we can do! In their ground-breaking paper, Mikolov et. al also performed mathematical functions of word vectors to make some incredible findings; in particular, the authors give the following math problem to their word2vec dictionary: V(King) - V(Man) + V(Woman) ~ V(Queen) It turns out that these distributed vector representations of words are extremely powerful in comparison questions (for example, is A related to B?), which is all the more remarkable when you consider that this semantic and syntactic learned knowledge comes from observing lots of words and their context with no other information necessary. That is, we did not have to tell our machine that Popcorn is a food, noun, singular, and so on. How is this made possible? Word2vec employs the power of neural networks in a supervised fashion to learn the vector representation of words (which is an unsupervised task). The above is an excerpt from the book Mastering Machine Learning with Spark 2.x by Alex Tellez, Max Pumperla and Michal Malohlava.  To learn more about the word2vec and doc2vec algorithms such as continuous-bag-of-words (CBOW), skip-gram, cosine similarity, distributed memory among other models  and to build applications based on these, check out the book. 
Read more
  • 0
  • 0
  • 11758

article-image-trending-datascience-news-1st-nov-17-headlines
Packt Editorial Staff
01 Nov 2017
5 min read
Save for later

1st Nov.' 17 - Headlines

Packt Editorial Staff
01 Nov 2017
5 min read
Google’s Firebase Predictions, QuickPivot’s machine learning suite Ada, Tensor algebra software Taco, and more in today’s data science news. Google Firebase in data science news Google applies machine learning expertise to create Firebase Predictions for user segmentation At the ongoing 2017 Firebase Dev Summit at Amsterdam, Google has unveiled Firebase Predictions, which can help “predict what users are going to do, before they actually do it.” Firebase Predictions uses machine learning on the analytics data to create dynamic user groups based on users' predicted behavior. These predictions are automatically available for use with Firebase Remote Config, the Notifications composer, and A/B testing. Google said that with Remote Config, users can boost conversions with a custom experience based on each user’s predicted behavior. And while Notifications composer will deliver the right message to the right user groups, A/B testing can help evaluate the effectiveness of prediction-based strategies. NVIDIA in News NVIDIA previews NVDLA deep learning processor it open sourced for deep neural network inference Recently, NVIDIA had open sourced the NVDLA deep learning processor that was based on the architecture of its "Xavier" automotive processor. A short for “NVIDIA Deep Learning Accelerator,” the NVDLA was created to promote a standard way to design deep learning inference accelerators. Now, at a recent briefing, NVIDIA's Vice President and General Manager of Autonomous Machines Deepu Talla has explained that the company's open-sourcing decision was taken into consideration thinking it could expand demand for cloud-based training of deep learning models. Currently, NVDLA is compatible with Linux while it could be ported to other operating systems. The modular NVDLA accelerator architecture includes a convolution core, single data processor, planar data processor, channel data processor, dedicated memory and data reshape engine. NVIDIA initiates new AI partnerships, training courses for Deep Learning Institute Expanding the scope of its Deep Learning Institute (DLI), NVIDIA said it is entering into new partnerships with Booz Allen Hamilton and deeplearning.ai to further broaden the range of its training content on artificial intelligence for thousands of students, developers and government specialists. The company has incorporated new University Ambassador Program under which instructors worldwide including professors from Arizona State, Harvard, Hong Kong University of Science and Technology and UCLA, will teach students critical job skills and practical applications of AI at no cost. The new courses will impart domain-specific applications of deep learning for finance, natural language processing, robotics, video analytics and self-driving cars. DLI is also bringing free AI training to young people through the nonprofit organization AI4ALL. Machine Learning suite Ada in News QuickPivot incorporates predictive models into marketing campaigns with machine learning suite Ada To uncover insights that could drive revenue growth, QuickPivot has launched Ada, a machine learning suite of three predictive marketing models. The three models are named Churn, Basket, and Cluster. Churn applies machine learning to calculate whether a customer will churn in 30, 60 or 90 days and understand how to best engage them before it’s too late. Basket increases average customer spend by understanding which of your products are often purchased together. Cluster predicts which purchase behaviors apply to certain demographics, finding both trends and anomalies. Tensor algebra compiler in News Taco: ‘Tensor algebra’ software speeds computations involving ‘sparse tensors’ 100-fold A team of researchers from MIT, French Alternative Energies and Atomic Energy Commission, and Adobe Research have created a new system called “Taco” that automatically produces code optimized for sparse data. Taco stands for tensor algebra compiler, and it speeds up computations 100-fold against the existing software packages. "Sparse representations have been there for more than 60 years," says Saman Amarasinghe, MIT professor who worked as senior author on the paper. "But nobody knew how to generate code for them automatically. People figured out a few very specific operations—sparse matrix-vector multiply, sparse matrix-vector multiply plus a vector, sparse matrix-matrix multiply, sparse matrix-matrix-matrix multiply. The biggest contribution we make is the ability to generate code for any tensor-algebra expression when the matrices are sparse." Other data science news Pyramid Analytics unveils platform-agnostic analytics OS “Pyramid 2018” Pyramid Analytics has announced the launch of Pyramid 2018, a server-based, multi-user analytics OS which helps conduct advanced self-service analytics without IT help. Using Pyramid 2018, business users can manage data strategies across any environment (on-premises, in the cloud, or across hybrid deployments), irrespective of the technology (like Oracle, SAP, Microsoft, Big Data, etc.). Pyramid 2018 also offers multiple AI engines and language support such as R, Python, TensorFlow, Weka, MLIB, SAS runtime and others, enabling organizations to integrate machine learning algorithms into their data activities. IBM launches machine learning tool “Trusteer New Account Fraud” to prevent bank fraud To help stop bank fraud, IBM has launched a new security tool named “IBM Trusteer New Account Fraud” that will apply machine learning and analytics to identify and stop cyber criminals from opening fraudulent bank accounts. The new tool, which will be added onto the Trusteer Pinpoint Detect portfolio, will bring together the device and network information used to open a new account, specifically looking into both the positive information as well as the negative indicators in the transaction process. The tool also uses behavioural analytics to verify fraud patterns.
Read more
  • 0
  • 0
  • 1421

article-image-sony-aibo-robot
Abhishek Jha
01 Nov 2017
3 min read
Save for later

Sony resurrects robotic pet Aibo with advanced AI

Abhishek Jha
01 Nov 2017
3 min read
A decade back when CEO Howard Stringer decided to discontinue Sony’s iconic entertainment robot AIBO, its progenitor Toshitada Doi had famously staged a mock funeral lamenting, more than Aibo’s disbandment, the death of Sony’s risk-taking spirit. Today as the Japanese firm’s sales have soared to a decade high beating projected estimates, Aibo is back from the dead. The revamped pet looks cuter than ever before, after nearly a decade of hold. And it has been infused with a range of sensors, cameras, microphones and upgraded artificial intelligence features. The new Aibo is an ivory-white, plastic-covered hound which even has the ability to connect to mobile networks. Using actuators, it can move its body remarkably well, while using two OLED panels in eyes to exhibit an array of expressions. Most importantly, it comes with a unique ‘adaptive’ behavior that includes being able to actively recognize its owner and running over to them, learning and interacting in the process – detecting smiles and words of praises – with all those head and back scratches. In short, a dog in real without canine instincts. Priced at around $1,735 (198,000 Yen), Aibo includes a SIM card slot to connect to internet and access Sony’s AI cloud to analyze and learn how other robot dogs are behaving on the network. Sony says it does not intend to replace a digital assistant like Google Home but that Aibo could be a wonderful companion for children and families, forming an “emotional bond” with love, affection, and joy. The cloud service that powers Aibo’s AI is however expensive, and a basic three-year subscription plan is priced at $26 (2,980 Yen) per month. Or you could sign up upfront for three years at around $790 (90,000 Yen). As far as the battery life is concerned, the robot will take three hours to fully charge itself once it gets dissipated after two hours of activity. “It was a difficult decision to stop the project in 2006, but we continued development in AI and robotics,” Sony CEO Kazuo Hirai said speaking at a launch event. “I asked our engineers a year and a half ago to develop Aibo because I strongly believe robots capable of building loving relationships with people help realize Sony’s mission.” When Sony had initially launched AIBO in 1999, it was well ahead of its time. But after the initial euphoria, the product somehow failed to get mainstream buyers as reboots after reboots failed to generate profits. That time clearly Sony had to make a decision as its core electronics business struggled in price wars. Today, times are different – AI fever has gripped the tech world. A plastic bone (‘aibone’) for the robotic dog costs you around 2,980 Yen. And that’s the price you pay for a keeping a robotic buddy around. The word “aibo” literally means a companion after all.
Read more
  • 0
  • 0
  • 21738

article-image-31st-oct-17-headlines
Packt Editorial Staff
31 Oct 2017
5 min read
Save for later

31st Oct.' 17 - Headlines

Packt Editorial Staff
31 Oct 2017
5 min read
Linux and AT&T’s project Acumos, blockchain platform SophiaTX, and more in today’s data science news. Project Acumos in News Acumos: AT&T, Tech Mahindra introduce open source AI platform hosted by Linux Foundation Linux Foundation, AT&T, and Tech Mahindra have together founded an open source artificial intelligence project “Acumos” that will help developers to build, share and deploy AI applications. The platform, which is being designed with both coders as well as non-coders in mind, will be launched in early 2018. While AT&T and Tech Mahindra are contributing the project code, Linux Foundation will host the platform and its AI marketplace. “Our goal with open sourcing the Acumos platform is to make building and deploying AI applications as easy as creating a website,” Mazin Gilbert, vice president of advanced technology at AT&T Labs, said in a statement. Blockchain in News SophiaTX open source platform can integrate blockchain with SAP Equidato Technologies AG has introduced a new project that can embed blockchain into major ERP and financial software systems such as SAP. Named SophiaTX, the platform includes three main components: a blockchain designed specifically for business environments, a development platform with integration application programming interfaces (APIs) to SAP and other ERP software, and a marketplace for companies and individuals to buy and sell apps. SophiaTX uses proprietary blockchain technology from DECENT, an open source blockchain content distribution platform. “With blockchain, most of the disruption comes from new entrants into the ecosystem. By opening the network for anyone to join and participate, SophiaTX has the best possible chance for global adoption,” CEO Jaroslav Kacina said. Sony could use blockchain to safeguard PlayStation Network Sony could one day use blockchain, which secures data in cryptographic "blocks", as a means to better protect PlayStation Network users. Reportedly, Sony has applied for a patent that would use the versatility of blockchain as another layer of cybercrime prevention by using Multi-factor Authentication (MFA). If this works out, users would be sent an encrypted verification codes via the blockchain to use alongside traditional username/password login details, thus further securing the transactions and data transfers. Other Data science News NVIDIA announces availability of cloud container registry for AI developers worldwide NVIDIA has launched NVIDIA GPU Cloud (NGC) container registry for AI developers worldwide. The cloud-based service is available immediately to users of the recently announced Amazon Elastic Compute Cloud (Amazon EC2) P3 instances featuring NVIDIA Tesla V100 GPUs. NVIDIA said it is planning to extend support to other cloud platforms soon. Developers who want to use NGC container registry can follow a three-step process: Sign up for a no-cost NGC account at www.nvidia.com/ngcsignup; Run an optimised NVIDIA image on cloud service provider platform; and Pull containers from NGC and get started. Deepo: The Docker image that comes with all popular deep learning frameworks Deepo is a Docker image with a full reproducible deep learning research environment. It contains almost all popular deep learning frameworks such as theano, tensorflow, sonnet, pytorch, keras, lasagne, mxnet, cntk, chainer, caffe, and torch. The project is available on GitHub repository under the MIT license. Alibaba develops new machine learning services ET City Brain and ET Industrial Brain Alibaba Cloud has developed a set of machine learning-powered platforms, such as ET City Brain and ET Industrial Brain, to solve real world problems. The company said it is making this technology available to young minds to unleash their creativity and imagination to come with new solutions to old problems like “preventing a traffic jam from happening in the first place.” Min Wanli, chief data scientist and general manager of big data division at Alibaba Cloud, told the young minds: “Don’t worry about the hard-coded part. Imagine anything possible, and then use ET Brain to try and explore that.” Earlier this year, Alibaba released version 2.0 of its PAI machine learning service, which is integrated into its various ET Brain platforms. NTREIS makes Remine big data platform available to its members North Texas Real Estate Information System (NTREIS) said it has launched lead generation and big data platform Remine for its 35,000 members. The agreement was earlier announced in February this year. "At any given time, less than 2% of all properties are listed for sale in an MLS. That means that 98% all future opportunity is hiding off-market. Remine's predictive analytics make it easy to identify future buyers and sellers,” Mark Schacknies, CFO of Remine, said. With beta version of 3.0, Windocks releases data delivery platform based on Docker's Container Technology and SQL Server Containers Windocks has announced the Beta release of Windocks 3.0, a data delivery platform built on Docker’s container technology, with support for SQL Server containers.“Enterprise customers are asking for an alternative to expensive, complex solutions built on Solaris UNIX,” Windocks co-founder Paul Stanton said, “Windocks 3.0 delivers the first container native data delivery solution that fits any budget. Windocks empowers software developers and database administrators with tools to create, manage, and deliver data environments more simply, and affordably than ever. In a single step SQLServer DBAs create clonable images, and users self-service environments with one click on the Windocks web application.”
Read more
  • 0
  • 0
  • 1565

article-image-30th-oct-17-headlines
Packt Editorial Staff
30 Oct 2017
4 min read
Save for later

30th Oct.' 17 - Headlines

Packt Editorial Staff
30 Oct 2017
4 min read
Couchbase Server’s latest version, Intel’s partnership on storing cryptocurrency holdings, and more in today’s data science news. Couchbase in News Couchbase Server 5.0 released NoSQL technology Couchbase Server has announced its latest version 5.0. The new release focuses on agility, flexibility, and performance at scale. To improve customer experience, Couchbase said their 5.0 release provides the “first true Engagement Database.” Building on the Role Based Access Control (RBAC) security model introduced in version 4.5 for Administrators, Couchbase Server 5.0 introduces RBAC for applications. There are also a number of rich query performance optimizations, feature enhancements and new functionality in N1QL Query engine. The interface for Couchbase Server’s web console has been redesigned with a modern take and offers a streamlined interface to Couchbase administration and development platform that optimizes common tasks and workflows. Intel in News Intel, Ledger collaborate on cryptocurrency holdings storage system To bring new solutions for storing cryptocurrency holdings, Intel has entered into a partnership with Ledger, a virtual currency hardware startup firm. Under the collaboration, Ledger’s Blockchain Open Ledger Operating System (BOLOS) will be integrated into Intel’s Software Guard Extension (SGX) secure storage product line. As part of the deal, Intel and Ledger will focus on developing a so-called “enclave” wherein private keys are stored and where transactions are both generated and signed. The partnership is seen as an extension to Intel’s focus on hardware under its distributed ledger technology (DLT) strategy, with a plan to integrate the mining chips into Intel’s products like desktop personal computers. Deloitte and SAP in News Deloitte unveils roadmap for SAP Leonardo based Deloitte Reimagine Platform solutions Deloitte has announced its latest roadmap for new Deloitte Reimagine Platform solutions, that includes a wide-ranging pipeline of new use cases to support faster transformation leveraging blockchain, machine learning, IoT and advanced analytics. Deloitte Reimagine Platform solutions was developed through a co-innovation relationship with SAP and based on the SAP Leonardo digital innovation system. Enterprise leaders will be able to explore use cases in person Nov. 2-3, 2017, at SAP Leonardo Live in Chicago, where Deloitte will demonstrate a number of applications, including a sensor-enabled cold chain use case that can help businesses monitor and manage temperature and humidity changes when shipping sensitive products. Another use case — a  "smart tap" solution — will showcase the use of liquid flow sensors to monitor and analyze marketing campaigns, trade promotions, and inventory management in real time. Beyond the roadmap, Deloitte plans to launch a global network of virtual studios focused on the Deloitte Reimagine Platform — to provide SAP customers with an up-close, in-person view of how they can modernize operations and innovate at scale with SAP Leonardo. Blockchain in News Blockchain wallet officially integrates Ethereum for iOS and Android Blockchain wallet has finally enabled support for Ethereum, according to a recent announcement. Ethereum users are now getting to avail all the Blockchain wallet functionalities, just like Bitcoin users. The integration is built in for both iOS and Android users. Blockchain-based AI project Poly AI will unveil Poly 1.0 in 2018 POLY AI, a developing project for artificial intelligence using Blockchain technology, has planned for the first generation of AI, Poly 1.0, in 2018, with supporting functions for the market such as pricing Bitcoin and trading supports. An ICO has been set to launch in this regard which will run in four different phases from Nov. 1 to Nov. 20. Only contributions in BTC currency will be accepted. Other data science news Seagate announces first drive for AI-powered video surveillance solution Seagate Technology plc has unveiled its SkyHawk AI hard disk drive (HDD), which is the first drive created specifically for artificial intelligence enabled video surveillance solutions. SkyHawk AI provides optimum bandwidth and processing power to manage data-intensive workloads, while simultaneously analyzing and recording footage from multiple HD cameras. “SkyHawk AI solutions will expand the design space for our customers and partners, allowing them to implement next-generation deep learning and video analytics applications,” said Sai Varanasi, vice president of product line management at Seagate Technology.
Read more
  • 0
  • 0
  • 1741

article-image-cancer-detection-artificial-intelligence-ai
Abhishek Jha
30 Oct 2017
2 min read
Save for later

Japanese scientists claim their AI system detects bowel cancer in less than a second

Abhishek Jha
30 Oct 2017
2 min read
In what could mark a major leap in cancer detection by artificial intelligence, researchers at Showa University in Yokohama, Japan, have developed an AI software that they claim can spot bowel cancer in less than a second. In a recently conducted trial, the AI system was successfully able to pinpoint potentially dangerous tumours from endoscopy images with clinical accuracy. Led by Dr. Yuichi Mori, the study involved 250 men and women in whom colorectal polyps had been detected using endocytoscopy. In total 306 polyps were assessed, and scientists used the AI program to predict the pathology of each polyp. The predictions were then compared with the final pathological report, and it was found that the system correctly detected 94% of cancers by matching each growth against over 30,000 images that were used for machine learning. What is remarkable is that it took the program less than a second to review each magnified endoscopic image and determine whether or not the polyp was malignant. “The most remarkable breakthrough with this system is that artificial intelligence enables real-time optical biopsy of colorectal polyps during colonoscopy, regardless of the endoscopists' skill,” Mori said. While the diagnostic system is yet to obtain the regulatory approval, Mori believes it could really help patients do away with needless surgeries. “This allows the complete resection of adenomatous (cancerous) polyps and prevents unnecessary polypectomy (removal) of non-neoplastic polyps,” he said. The findings were also presented at the ongoing United European Gastroenterology (UEG) Week in Barcelona, Spain. The research team is now working full throttle on this project, and they plan to take the study to a new level by developing an automatic polyp detection system. "Precise on-site identification of adenomas during colonoscopy contributes to the complete resection of neoplastic lesions" Mori added. "This is thought to decrease the risk of colorectal cancer and, ultimately, cancer-related death."
Read more
  • 0
  • 0
  • 11722
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-27th-oct-17-headlines
Packt Editorial Staff
27 Oct 2017
4 min read
Save for later

27th Oct.' 17 - Headlines

Packt Editorial Staff
27 Oct 2017
4 min read
Anaconda version 5.0, Cisco-Google cloud partnership, NVidia Volta GPU on AWS, and more in today’s data science news. Anaconda in News Anaconda Distribution 5.0 released Data science distribution platform Anaconda has announced its version 5.0 where more than 100 packages have been added or updated. The new release offers wider scope of compatibility as it features all new compilers on macOS and Linux, along with more flexible dependency pinning of NumPy packages. Anaconda Distribution 5.0 is immediately available for download and installation. Alternatively, users can upgrade to version 5.0 by using conda update conda followed by conda install anaconda=5.0 Cisco and Google cloud collaboration in News Cisco, Google team to forge hybrid cloud partnership Cisco and Google are working together on an open hybrid solution that may help companies manage software services both on the Google Cloud as well as in their own data centers. The partnership will help customers enhance agility and security in a hybrid world, the companies said. It will set a complete environment that will develop, run, secure and monitor workloads, using which customers can improve their existing infrastructure and plan cloud migration well enough to prevent a lock-in. Google said in its official blog announcement that open source platforms Kubernetes and Istio will be at the forefront of the new architecture. “We’re working together to deliver a consistent Kubernetes environment for both on-premises Cisco Private Cloud Infrastructure and Google’s managed Kubernetes service, Google Container Engine. This way, you can write once, deploy anywhere and avoid cloud lock-in, with your choice of management, software, hypervisor and operating system,” Google said, adding that Istio will enable developers use policy-driven controls to scalably connect, help secure, discover and manage the applications. Nvidia Volta GPU in News Nvidia makes its Volta GPUs available through Amazon Cloud Amazon has beaten Google and Microsoft in the cloud race for providing Nvidia’s next generation GPU Volta through Amazon Web Services (AWS). Customers will be able to run instances with up to 8 V100 GPUs, which will initially be made available from AWS’ Northern Virginia, Oregon, Ireland, and Tokyo data centers. Other Data Science News Blue Canoe secures $1.4M to improve English speaking using machine learning Blue Canoe Learning, a new artificial intelligence startup that helps ESL speakers improve their English pronunciation, has raised an initial $1.4 million investment from Kernel Labs and others to expand its operations. Using machine learning and speech recognition, Blue Canoe has digitized the Color Vowel System and packaged it as an app where users (the non-native English speakers) play a card game and say the vocabulary word on the card; a machine learning system listens and identifies whether they have pronounced it correctly, and if not, gives relevant feedback. The startup will be getting its next few months of guidance and nurturing from the Allen Institute for AI. Voyomotive unveils Data Analytics GateWay to make available advanced vehicle data Voyomotive has launched an innovative program Data Analytics GateWay that will make advanced vehicle data available to industry partners, and enable development of the next generation of automotive applications. Voyomotive said the data provided by GateWay is typically not available from OEM or other aftermarket telematics systems and is ideally suited for AI, Machine Learning, Driverless Car, and app development. There is no cost to join GateWay, but the program is limited to corporate partners and software developers who can use their LinkedIn accounts for instant access or apply by filling out the GateWay Partner Application. eBay introduces deep learning-based image search capabilities for finding products using photos As originally announced in July, eBay has launched two new visual search tools, Image Search and Find it on eBay, to let online shoppers find items using photos from their phone or web. The new features make use of advancements in computer vision and deep learning, including the neural networks, eBay said. Pinterest, Google and Amazon already have visual search functionalities.
Read more
  • 0
  • 0
  • 1386

article-image-cisco-google-hybrid-cloud-partnership
Abhishek Jha
27 Oct 2017
2 min read
Save for later

What the Cisco, Google cloud partnership means for the tech world

Abhishek Jha
27 Oct 2017
2 min read
In technology, there are no strange bedfellows. Even at the height of their cold war, Microsoft and Apple did not abhor each other, working together on tech projects that rewarded mutual benefits – a term that sometimes also includes surviving against disruptions. It is therefore no surprise that a partnership between Cisco and Google goes beyond helping customers with more efficiency in hybrid cloud environment. It’s a win-win situation for the two Silicon Valley giants. Google wants to catch up to Amazon, and Cisco is facing a serious threat from cloud services (possibly an existential threat). At one time, Cisco had trumped Microsoft to become world's most valuable publicly traded company. It was Cisco whose networking hardware was used to build the internet. But today, companies that were once overly reliant on Cisco equipment are increasingly renting cloud services. Who won’t prefer the cloud after all, when you have AWS shielding you from all the heavy workloads in background! Add to it the recent tie up between VMware and Amazon, and the future looked bleak. With its software defined data center approach, VMware posed a definite scare. To tell the truth, Cisco was running out of partners. Remember EMC was a key Cisco partner at one time, but today Dell Technologies owns it. The game is complicated. Which is why Cisco and Google have joined forces. Both have pioneered different eras of internet, and promise to offer something different. With companies complaining about separate tools to manage on-premise software and those on cloud, the security problems definitely needed to be addressed. Adding Cisco networking and security software over Google programming technology will greatly help companies manage software services that run in their own data centers or in facilities operated by external cloud services. It brings agility and security in a hybrid ecosystem. Google will definitely benefit from Cisco’s long list of corporate customers – it being the elderly partner that has witnessed generations of change in IT. But the hurdle in beating Amazon is that it never slackens, and Google knows it has mountains to climb. Where it’s all cloudy at the peak.
Read more
  • 0
  • 0
  • 10537

article-image-trending-datascience-news-26th-oct-17-headlines
Packt Editorial Staff
26 Oct 2017
5 min read
Save for later

26th Oct.' 17 - Headlines

Packt Editorial Staff
26 Oct 2017
5 min read
SciPy 1.0 release, Android 8.1 Developer Preview, SUSE’s Linux for SAP on IBM Cloud, and more in today’s data science news. SciPy in News SciPy 1.0 released Open source Python library SciPy has announced the release of its version 1.0, 16 years after its first version 0.1 was released in 2001. SciPy 1.0 has some major build improvements where .  Windows wheels are available on PyPI for the first time, and continuous integration has been set up on Windows and OS X in addition to Linux. The new release, which has a number of deprecations and API changes, requires Python 2.7 or >=3.4 and NumPy 1.8.2 or greater. SciPy now also has a formal governance structure.  It consists of a BDFL (Benevolent Dictator For Life) and a Steering Committee. Pauli Virtanen is currently the BDFL. Google Android 8.1 in News Android 8.1 Developer Preview: NNAPI to do “hardware acceleration” of machine learning, Google says Google has launched the developer preview of Android 8.1, where it has introduced a new ‘Neural Networks API’ (NNAPI) that can provide apps with “hardware acceleration” for on-device machine learning operations. Other than the NNAPI, there are few other updates and bug fixes for things like autofill and notifications. Android 8.1 will have two preview releases. While first release will be a "beta" with "final APIs," the second preview will provide "near-final system images for final testing" in November. The final release will then arrive sometime in December. Cloud Storage in News SUSE delivers Linux OS for SAP on IBM Cloud Starting fourth quarter of 2017, SUSE Linux Enterprise Server for SAP Applications will be available as an operating system for SAP solutions on the IBM Cloud. In addition, IBM Cloud is now a SUSE Cloud Service Provider, giving customers an open source platform using pay as you go model. SUSE Linux Enterprise Server for SAP Applications on the IBM Cloud will enable customers to quickly build, deliver and deploy business-critical workloads in SAP NetWeaver and SAP HANA in the cloud. Also, customers can integrate their SAP applications running on SUSE Linux Enterprise across different hardware platforms, including IBM Power, into a hybrid or private cloud deployment. Customers will benefit from IBM's global network of nearly 60 cloud data centers across six continents as well as access to the rich IBM Cloud catalog of services including AI, data and analytics, IoT, serverless and more. SAP updates Vora to further simplify cloud and hybrid data storage SAP has announced new improvements on its Vora solution, further simplifying its deployment on public cloud and making the migrations more flexible. The live customer cloud service on SAP Data Network can now use the distributed computing capabilities of SAP Vora. The updated version also supports Azure Data Lake (Azure is Microsoft’s public cloud). SAP Vora  can now also load and distribute data files stored in Amazon S3 (Simple Storage Service). Apart from these, the latest release comes with an improved monitoring framework, support for Apache Spark2.x and optimizations for connectivity with the SAP HANA platform. SAP releases Data Hub SAP Data Hub is a solution that will help businesses tackle the complexity of their data systems and make use of the vast data gatherer from various sources. SAP Data Hub creates value across the diverse data landscape through data integration, data orchestration and data governance, as well as by creating powerful data pipelines that can accelerate positive business results. “The data hub is really a pipeline or data landscape management solution. It’s for customers who want to connect multiple data sources,” Director of Product Marketing at SAP Karen Sun said. Deep Learning AI Services in News HPE announces new deep-learning based AI platforms and services Hewlett Packard Enterprise (HPE) has unveiled new platforms and services tailored to facilitate the adoption of Artificial Intelligence. Within AI, the company will initially focus on deep learning.  The new services include HPE Rapid Software Installation for AI, HPE Deep Learning Cookbook, HPE AI Innovation Center, and Enhanced HPE Centers of Excellence (CoE). BrainChip to demonstrate AI-powered video analytics technology at Milipol 2017 BrainChip Holdings announced that it will be exhibiting at Milipol 2017 in Paris Nov. 21-24. Organised by the French Ministry of Interior in partnership with other governmental bodies, Milipol Paris is one of the largest homeland security conferences, attracting over 24 thousand visitors from 143 countries. Inspector Jean-Francois Lespes, Chief of the Indictable Offense Department at the Toulouse National Police, will be sharing use cases of BrainChip technology and show how it helps in the investigation of major crimes. Inspector Lespes' organization recently completed a successful trial of BrainChip Studio. Bitcoin miner Bitmain announces new Deep Learning AI products Bitmain has launched deep-learning based artificial intelligence products, called BM1680 and SC1. The new applications are a customized tensor computing ASIC (Application Specific Integrated Circuit) that can be applied in a variety of use cases such as image and speech recognition, robotics, autonomous vehicle technology, security surveillance, IoT, and more. “Deep learning is very intensive computationally and our experience in creating high-performing hardware for Bitcoin has absolutely prepared us for this exciting area of computing,” said Bitmain CEO Micree Zhan. “AI hardware is an area that Bitmain is proactively developing to power the next generation of AI applications.” The hardware is fully compatible with popular AI platforms including mainstream Caffe, Darknet, Googlenet, VGG, Resnet, Yolo, Yoto2 and other models. Big Data As a Service in News BlueData and Networld enter partnership to deliver big-data-as-a-service in Japan BlueData and Networld Corporation have announced a distribution agreement under which Networld will promote, market, sell, deploy, and support BlueData EPIC software in Japan. "When VMware emerged as the leader in server virtualization, we worked with them to bring their technology to Japan. Now we're in the era of Big Data analytics, data science, and deep learning. The clear leader in bringing virtualization and containerization to the Big Data ecosystem is BlueData, and we are proud to partner with them in Japan," President and CEO of Networld Shoichi Morita said.
Read more
  • 0
  • 0
  • 1590

article-image-scipy-version-1-0
Abhishek Jha
26 Oct 2017
3 min read
Save for later

SciPy 1.0 is here: A brief history and perspective

Abhishek Jha
26 Oct 2017
3 min read
If there is one word that exclusively defines computing parlance, it’s the version. And that can be amusing if you are high on orthodox grammar. Because it takes 29 years for Windows to grow from version 1.01 to eventually the 10. So now that SciPy has released its version 1.0, developer community is abuzz with the question why the Python library took 16 years for such a nomenclature. In SciPy’s case the 1.0 version number was long overdue. Given the high quality code and documentation, and the stability and backwards compatibility, a 1.0 label was guaranteed. But the best in the business are always humble (read perfectionist). Despite being a mature and stable library that has been used in production settings for a long time, SciPy was reluctant in calling itself "1.0" because it believed it was not perfect, and that there were some dusty corners left. It is otherwise normal for open source projects to arrive with a 1.0 and proclaim "we are right up there." SciPy has a long history, during which it has matured as a software project. Largely been written by and for scientists, its development community has dramatically grown over the years. It has evolved from the same era when internet was just starting to bring together like-minded mathematicians and scientists. And many procrastinated their PhDs to write extension modules for this Python library – all this when email was how you helped a project improve, long before Github arrived with its "patch" collaborations and inputs. “The existence of a nascent Scipy library, and the incredible – if tiny by today's standards – community surrounding it is what drew me into the scientific Python world while still a physics graduate student in 2001,” says a nostalgic Fernando Perez who is a proud SciPy author, “Today, I am awed when I see these tools power everything from high school education to the research that led to the 2017 Nobel Prize in physics.” In SciPy 1.0, there are some major build improvements. Windows wheels are available on PyPI for the first time, and continuous integration has been set up on Windows and OS X in addition to Linux. It has a number of deprecations and API changes. But another standout statement from the release is the announcement of a formal governance structure. Now, SciPy consists of a BDFL (Benevolent Dictator For Life) and a Steering Committee. Pauli Virtanen is currently the BDFL. Reminiscing the timeline - 2001: the first SciPy release - 2005: transition to NumPy - 2007: creation of scikits - 2008: scipy.spatial module and first Cython code added - 2010: moving to a 6-monthly release cycle - 2011: SciPy development moves to GitHub - 2011: Python 3 support - 2012: adding a sparse graph module and unified optimization interface - 2012: removal of scipy.maxentropy - 2013: continuous integration with TravisCI - 2015: adding Cython interface for BLAS/LAPACK and a benchmark suite - 2017: adding a unified C API with scipy.LowLevelCallable; removal of scipy.weave - 2017: SciPy 1.0 release In any case, don't be fooled by the 1.0 number. The developer community that has contributed to and nurtured SciPy for nearly two decades will keep driving forward the project that has been the bedrock of modern scientific computing ecosystem. For as the current BDFL says, not long after 1.0 comes 1.1.
Read more
  • 0
  • 0
  • 1649
article-image-trending-datascience-news-25th-oct-17-headlines
Packt Editorial Staff
25 Oct 2017
5 min read
Save for later

25th Oct.' 17 - Headlines

Packt Editorial Staff
25 Oct 2017
5 min read
Announcements from Neo4j's GraphConnect conference, Microsoft's updates on Windows Dev Center, MapR's launch of MapR Data Science Refinery, and more in today's top data science news. Graph database Neo4j in News New Neo4j platform gives developers a set of tools for building enterprise graph applications Graph database leader Neo4j has launched a new platform for developers to build graph-based applications using a common set of services. Breaking the announcement at its GraphConnect conference in New York, Neo4j said the new platform will help the graph databases connect to various enterprise systems allowing developers to build applications more quickly. Until now, customers were forced to create their own architecture to manually connect to these systems. Neo4j releases Cypher for Apache Spark At its ongoing GraphConnect conference, Neo4j announced a new initiative to support the design and execution of graph queries in the Apache Spark environment. Neo4j released an early version of Cypher for Apache™ Spark® (CAPS) language toolkit to the openCypher project. This contribution will allow big data analysts to incorporate graph querying in their workflows, making it easier to bring graph algorithms to bear, dramatically broadening how they reveal connections in their data.  Developers of Spark applications now join the users of Neo4j, SAP HANA, Redis Graph and AgensGraph, among others, in gaining access to Cypher, the leading declarative property graph query language. This also expands the tooling available to any developer, under Apache 2.0 licenses from the openCypher project. Neo4j 3.3 released with improved performance and security Neo4j has announced its latest release – Neo4j 3.3. With Neo4j 3.3 write performance has improved with on average 50% compared to Neo4j 3.2, making it possible to ingest more data in less time. Bulk writes at initial graph creation reduces the memory footprint by up to 40%. The new Cypher Slotted Runtime results in faster queries while using one third of the memory compared to the Neo4j 3.2 Cypher Runtime. On security front, Neo4j 3.3 introduces new support for intra-cluster encryption, including multi-DC cluster communication encryption. The new version also brings new kernel improvements as it now allows key configuration parameters to be changed on the fly, without needing to recycle a database instance. Microsoft in News Microsoft brings real time health reporting to Windows Dev Center Microsoft will now offer near real time health reporting in its Windows Dev Center, thus helping developers to quickly fix stability issues in their apps. If you have joined the Dev Center Insider Program, the Health report’s 72H view in Dev Center now shows data for crashes, hangs, memory failures and JavaScript exceptions within minutes of those events. Previously, this data was available only after several hours. Microsoft said it will soon bring this feature to all Dev Center users. Microsoft adds new feature called Review Insights in Windows Dev Center dashboard In Windows Dev Center, under the Review Reports, Microsoft is introducing a new feature called Review insights. Review insights uses machine learning to classify new app reviews, even non-English reviews, into one of 12 pre-defined categories. This will help developers to quickly understand customer sentiment by filtering their app’s reviews by category. Developers can also apply additional filters, such as OS version or rating, to further isolate issues and find actionable feedback. MapR in News MapR launches MapR Data Science Refinery to leverage artificial intelligence MapR has unveiled MapR Data Science Refinery, a new solution that provides data scientists an easy way to access and analyze all data in-place, to collaborate, build and deploy machine learning models on the MapR Converged Data Platform. Using a developer friendly notebook and a wide range of open source data science tools that integrate directly with the MapR Platform, the MapR Data Science Refinery is easy to deploy using a secure, persistent, and extensible container that can be distributed to many data science teams across multi-tenant environments. Other data science News SQLite 3.21.0 released SQL database engine SQLite has released its version 3.21.0 where it added several new features and enhanced the running functionalities. The new version also contains a number of bug fixes. Apache Software Foundation upgrades Apache PredictionIO to Top-Level Project Apache PredictionIO, an open source platform donated last year by Salesforce, has been promoted by the Apache Software Foundation (ASF) from the Apache Incubator to Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles. Apache PredictionIO focuses on enabling developers to quickly develop and deploy production-ready Machine Learning pipelines. The project features an engine template gallery, where developers can pick a template, and quickly ramp up a complete setup for their Machine Learning use cases. Apache PredictionIO is in use at ActionML, BizReach, LiftIQ, Pluralsight, and Salesforce, among others. Baidu announces Deep Voice 3 project which can learn to imitate almost every accent Baidu has launched the third version of Deep Voice which can dramatically shorten the learning time and support a higher number of language accents. Deep Voice 3 can learn as many as 2500 voices by processing the data in just 30 minutes. The Deep Voice projects use deep learning techniques to convert text to speech. Google has a similar project called WaveNet through its DeepMind unit. Baidu said the future versions may use even bigger data set and master up to 10,000 voices. Amazon announces general availability of Amazon Aurora with PostgreSQL Compatibility Amazon Aurora with PostgreSQL Compatibility is now generally available, Amazon announced on its official blog post. It is compatible with PostgreSQL 9.6.3 and scales automatically to support up to 64 TB of storage, with 6-way replication behind the scenes to improve performance and availability. Amazon Aurora with PostgreSQL Compatibility is fully managed and can perform up to 3x the throughput users otherwise get running PostgreSQL on their own.
Read more
  • 0
  • 0
  • 1534

article-image-neo4j-native-graph-platform
Abhishek Jha
25 Oct 2017
3 min read
Save for later

From Graph Database to Graph Company: Neo4j's Native Graph Platform addresses evolving needs of customers

Abhishek Jha
25 Oct 2017
3 min read
In their own words, Neo4j is evolving from being “just a graph database company” to becoming a full-fledged graph technology platform including analytics, ETL and visualization. Today, at their GraphConnect conference in New York, Neo4j announced a new Native Graph Platform that will add analytics, data import and transformation, visualization, and discovery all atop its graph database. The announcement is not standalone. It comes alongside an open-source contribution to the Hadoop ecosystem. That Cypher, Neo4j’s graph query language, is now available for Apache Spark. It’s no secret that Neo4j wants to make Cypher the standard query language for graph, and now with all of the components in Native Graph Platform using Cypher, the new set of tools are sure to boost adoption. This is why it’s more of a strategic shift. And far beyond facilitating a switch from graph database to graph solutions. It is, in fact, going to dramatically expand Neo4j’s enterprise footprint by establishing relationships with a variety of new users and roles, including data scientists, big data experts, IT business analysts and line-of-business managers. The story about ‘evolving need of the customers’ is true. Today, customers do not deploy in isolation. We are living in a polyglot tech world with heterogeneous backends. Needs are bound to change. “Many companies started with us for retail recommendation engines or fraud detection, but now they need to drive their next generation of connected-data to power complex artificial intelligence applications," CEO Emil Eifrem says. "Our customers not only need a high performance, scalable graph database, they need algorithms to feed it, they need visualization tools to illustrate it, they need data integration to dive deeply into their data lakes,” Eifrem adds, hinting how the new Native Graph Platform would facilitate Neo4j’s ‘connections-first’ approach. Whether for increased revenue, fraud detection or planning for a more connected future, building networks of connected data proves to be the single biggest competitive advantage for companies today. This will become even more evident in the future as machine learning, intelligent devices and real-time activities like conversational commerce are all dependent on connections. Probably this is why Neo4j is extending the reach of its native graph stack, which has already seen success across multiple use cases with organizations ranging from NASA to eBay to Comcast. But what about the big giants like Oracle jumping into the competition? "When we got started, there was no one. Now, in past couple of years, everyone and their mom have released a graph database," Eifrem said. "The space is very much heating up." "There are two sides to it. Of course, when you have Oracle, SAP, Amazon and Microsoft all announcing that they're going to your space [it means] we're up against, from our perspective, infinite resources – and that is scary." Yet, Neo4j is not scared. The crucial thing, according to Eifrem, is that the continued awareness has brought graph technology to the mainstream. And that is where Neo4j sees more opportunity than threats. “We don't have the biggest microphone in the world. We've stood alone on this mountain for the longest time, and now we have some really powerful voices joining in. That's 10 times more important than losing the occasional deal because Oracle had a lock-in on that customer.”
Read more
  • 0
  • 0
  • 12913

article-image-dr-brandon-explains-nlp-natural-language-processing-jon
Aarthi Kumaraswamy
25 Oct 2017
5 min read
Save for later

Dr.Brandon explains NLP (Natural Language Processing) to Jon

Aarthi Kumaraswamy
25 Oct 2017
5 min read
[box type="shadow" align="" class="" width=""] Dr.Brandon: Welcome everyone to the first episode of 'Date with data science'. I am Dr. Brandon Hopper, B.S., M.S., Ph.D., Senior Data Scientist at BeingHumanoid and, visiting faculty at Fictional AI University.  Jon: And I am just Jon - actor, foodie and Brandon's fun friend. I don't have any letters after my name but I can say the alphabets in reverse order. Pretty cool, huh! Dr.Brandon: Yes, I am sure our readers will find it very amusing Jon. Talking of alphabets, today we discuss NLP. Jon: Wait, what is NLP? Is it that thing Ashley's working on? Dr.Brandon: No. The NLP we are talking about today is Natural Language Processing, not to be confused with Neuro-Linguistic Programming.   Jon: Oh alright. I thought we just processed cheese. How do you process language? Don't you start with 'to understand NLP, we must first understand how humans started communicating'! And keep it short and simple, will you? Dr.Brandon: OK I will try my best to do all of the above if you promise not to doze off. The following is an excerpt from the book Mastering Machine Learning with Spark 2.x by Alex Tellez, Max Pumperla and Michal Malohlava. [/box]   NLP helps analyze raw textual data and extract useful information such as sentence structure, sentiment of text, or even translation of text between languages. Since many sources of data contain raw text, (for example, reviews, news articles, and medical records). NLP is getting more and more popular, thanks to providing an insight into the text and helps make automatized decisions easier. Under the hood, NLP is often using machine-learning algorithms to extract and model the structure of text. The power of NLP is much more visible if it is applied in the context of another machine method, where, for example, text can represent one of the input features. NLP - a brief primer Just like artificial neural networks, NLP is a relatively "old" subject, but one that has garnered a massive amount of attention recently due to the rise of computing power and various applications of machine learning algorithms for tasks that include, but are not limited to, the following: Machine translation (MT): In its simplest form, this is the ability of machines to translate one language of words to another language of words. Interestingly, proposals for machine translation systems pre-date the creation of the digital computer. One of the first NLP applications was created during World War II by an American scientist named Warren Weaver whose job was to try and crack German code. Nowadays, we have highly sophisticated applications that can translate a piece of text into any number of different languages we desire!‌ Speech recognition (SR): These methodologies and technologies attempt to recognize and translate spoken words into text using machines. We see these technologies in smartphones nowadays that use SR systems in tasks ranging from helping us find directions to the nearest gas station to querying Google for the weekend's weather forecast. As we speak into our phones, a machine is able to recognize the words we are speaking and then translate these words into text that the computer can recognize and perform some task if need be. Information retrieval (IR): Have you ever read a piece of text, such as an article on a news website, for example, and wanted to see similar news articles like the one you just read? This is but one example of an information retrieval system that takes a piece of text as an "input" and seeks to obtain other relevant pieces of text similar to the input text. Perhaps the easiest and most recognizable example of an IR system is doing a search on a web-based search engine. We give some words that we want to "know" more about (this is the "input"), and the output are the search results, which are hopefully relevant to our input search query. Information extraction (IE): This is the task of extracting structured bits of information from unstructured data such as text, video and pictures. For example, when you read a blog post on some website, often, the post is tagged with a few keywords that describe the general topics about this posting, which can be classified using information extraction systems. One extremely popular avenue of IE is called Visual Information Extraction, which attempts to identify complex entities from the visual layout of a web page, for example, which would not be captured in typical NLP approaches. Text summarization (darn, no acronym here!): This is a hugely popular area of interest. This is the task of taking pieces of text of various length and summarizing them by identifying topics, for example. In the next chapter, we will explore two popular approaches to text summarization via topic models such as Latent Dirichlet Allocation (LDA) and Latent Semantic Analysis (LSA). If you enjoyed the above excerpt from the book Mastering Machine Learning with Spark 2.x by Alex Tellez, Max Pumperla, and Michal Malohlava, check out the book to learn how to Use Spark streams to cluster tweets online Utilize generated models for off-line/on-line prediction Transfer learning from an ensemble to a simpler Neural Network Use GraphFrames, an extension of DataFrames to graphs, to study graphs using an elegant query language Use K-means algorithm to cluster movie reviews dataset and more
Read more
  • 0
  • 0
  • 7270
article-image-trending-datascience-news-24th-oct-17-headlines
Packt Editorial Staff
24 Oct 2017
4 min read
Save for later

24th Oct.' 17 - Headlines

Packt Editorial Staff
24 Oct 2017
4 min read
Cray Supercomputers on Microsoft Azure, Blockchain services on Azure Government, and more in today's top data science news. Microsoft Azure in News Cray bringing its supercomputers to Microsoft Azure Microsoft has entered into an exclusive partnership with Cray to provide its customers access to supercomputing capabilities in Azure. Under the partnership, customers can get a dedicated Cray XC or CS series supercomputers in Azure to run HPC and AI applications alongside their other cloud workloads directly on the Azure network. As Cray systems easily integrate with Azure Virtual Machines, Azure Data Lake storage, the Microsoft AI platform, and Azure Machine Learning services for rich workflows and collaboration, customers can solve their toughest challenges in climate modeling, precision medicine, energy, manufacturing, and other scientific research. Microsoft adds blockchain services to Azure Government, pushes forward Coco framework Microsoft has embedded blockchain capabilities into its Azure Government Cloud. The company announced at the recently held Microsoft Cloud Forum that Azure Government will now support a wide array of blockchain and distributed ledger solutions, including Ethereum, Hyperledger, R3 Corda and Chain. In addition, Microsoft said it has a proof of concept for the Coco framework which could be made public early next year. The Coco Framework, which the Microsoft calls its “trusted execution environment designed to remove the latency” found in public blockchains, will serve as an open-source framework. Blockchain for Azure Government will help government agencies deal with issues such as the distribution of funds after natural disasters and the registration of property ownership. Analytics in News Teradata unveils Teradata Analytics Platform offering users preferred analytic environment Teradata has launched a new analytics system called Teradata Analytics Platform that embeds analytics close to data, and enables users throughout an organization to leverage their preferred analytic tools and languages, at scale, across multiple data types. “It’s making advanced analytics really accessible to a broad set of users, not just those with specialized skills,” said Imad Birouty, director of product marketing at Teradata, adding that the new platform will help “bring the data and analytics functions together so that they can be part of a company’s daily operation; repeatable, reusable, and extended out to a broad set of users.” Teradata Analytics Platform is part of the company’s Teradata Everywhere strategy that was launched last year with four key components: Deploy anywhere, Buy any way, Move any time, and Analyze anything. Cloudera speeds analytics deployment for next generation Cybersecurity Hub Cloudera has teamed up with Arcadia Data, Centrify, and StreamSets to simplify the first use case on the Cybersecurity hub. Leveraging Cloudera Manager’s parcel deployment capabilities, chief information security officers (CISOs) can now access Cloudera’s cybersecurity solution based on Apache Spot (incubating), through an app store-like experience, making machine learning simple and accessible by removing the barrier of entry to data-driven insights for security operation centers. The new service also provides easy access to associated ISV capabilities such as ingestion, visualisation, and analytics. “Together with our partners, Cloudera is providing CISOs with a point and click path to deploy and benefit from a next generation cybersecurity data platform,” Cloudera CEO Tom Reilly said. Other Data Science News ErosCoin kicks off ICO to fund R&D and business expenses Blockchain-based payment gateway solution EROSCOIN is launching an ICO for ERO tokens, and are accepting BTC, ETH, and LTC as means of contribution. Out of the total supply of 2.4 billion ERO tokens, 1.2 billion coins are available for the ICO. Half of the funds raised will go towards  research and core development and the other half will get divided between other business expenses like marketing, legal and operational work as well as bounty programs. EROS foundation, which will ultimately be responsible for the development of EROSCOIN platform, is currently slated to receive 20% of the total coin supply. The remainder of all issued tokens will then be distributed to advisory and escrow (9%), a reserve fund (10%), charity (10%) and bounties (1%). The first phase of the EROS ICO will offer a 25% bonus to investors who contribute on the 1st and 2nd day of the sale, with bonuses then decreasing by roughly 5% on a weekly basis. The sale will run over a month-long period.
Read more
  • 0
  • 0
  • 1370

article-image-trending-datascience-news-23rd-oct-17-headlines
Packt Editorial Staff
23 Oct 2017
3 min read
Save for later

Google’s first smartphone chip, Shutterstock’s Composition Aware Search feature, and more - 23rd Oct.' 17 Headlines

Packt Editorial Staff
23 Oct 2017
3 min read
Google's chip Pixel Visual Core in News Pixel Visual Core: Google’s first custom smartphone chip There is a special component inside Google’s flagship smartphones Pixel 2 and Pixel 2 XL that was not really announced during the Oct. 4 launch event. Google has now said it put a custom, self-designed chip inside Pixel 2 for image processing called Pixel Visual Core. A couple of months back there were rumors that Google could dabble into chip design and that a key Apple veteran had been hired to guide the architecture. Pixel Visual Core reportedly has eight Image Processing Units (IPUs), where each IPU core is packed with 512 arithmetic logic units capable of running 3 trillion operations per second. The Pixel Visual Core is expected to be activated in a future software update for Pixel 2 users, once developers have been able to write apps for it. AI in News Shutterstock’s Composition Aware Search feature uses deep learning to refine image search Stock photo service Shutterstock has launched a new feature that will allow users to search images based on their compositions and layouts. The photo search tool, called Composition Aware Search, uses advanced deep learning technology to find the right images excluding the thousands of irrelevant images in the traditional image search. The feature is still in beta and its patent is pending. ThinkSCM announces AI-enabled predictive analytics tool to boost supply chain At the recently held APICS 2017 supply chain conference, ThinkSCM unveiled a prescriptive analytics tool that uses artificial intelligence for data analysis and future predictions and recommendations. The company said they had developed the algorithm to bridge a gap in SAP software for a client, but after the successful outcome, ThinkSCM decided to launch it commercially. With McAfee Investigator and McAfee Cloud Workload Security, McAfee uses AI to boost enterprise security McAfee has introduced several machine learning, deep learning, and artificial intelligence features into its enterprise security offerings and make use of automation, reasoning and data curation provided by analytics technologies. Apart from new innovations that can decrypt ransomware and steganography detection, the company has launched two new solutions: McAfee Investigator and McAfee Cloud Workload Security. While McAfee Investigator uses advanced analytics for accurate threat prioritization, McAfee Cloud Workload Security addresses challenges such as visibility across hybrid cloud workloads and enterprises service architecture. Other Data Science News Spring Data Neo4j 5.0 release brings smarter querying for better performance After Spring Data Neo4j 4 which was a total rewrite from earlier versions, Spring Data Neo4j 5 has been released as another major version that brings several new functionalities. Built upon Neo4j OGM 3.0, SDN 5 adds dynamic properties and schema-based loading which eventually corrects the problems with SDN 4 where more data were often loaded than required. SDN 5 is now using a new load strategy based on a schema derived automatically from class metadata. It uses nested pattern comprehensions generated from the schema, and now only the data which will be mapped are fetched. Another change that makes the mapping easier is that entity fields are now written directly by the object mapper, not through annotated or derived setters. In addition to added support for query and projections methods, Spring Data Neo4j 5.0 carries the latest enhancements in the Spring world as it is built on the foundations of Java 8, the new Spring Framework 5.0 and Spring Data 2.0. A detailed documentation for the migration from SDN 4.2 to the new version has been released with guidelines.  
Read more
  • 0
  • 0
  • 1331
Modal Close icon
Modal Close icon