Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

Tech News

3711 Articles
article-image-soft-skills-data-scientists-teach-child
Aaron Lazar
09 Nov 2017
7 min read
Save for later

Soft skills every data scientist should teach their child

Aaron Lazar
09 Nov 2017
7 min read
Data Scientists work really hard to upskill their technical competencies. A rapidly changing technology landscape demands a continuous ramp up of skills like mastering a new programming language like R, Python, Java or something else, exploring new machine learning frameworks and libraries like TensorFlow or Keras, understanding cutting-edge algorithms like Deep Convolutional Networks and K-Means to name a few. Had they lived in Dr.Frankenstein's world, where scientists worked hard in their labs, cut-off from the rest of the world, this should have sufficed. But in the real world, data scientists use data and work with people to solve real-world problems for people. They need to learn something more, that forms a bridge between their ideas/hypotheses and the rest of the world. Something that’s more of an art than a skill these days. We’re talking about soft-skills for data scientists. Today we’ll enjoy a conversation between a father and son, as we learn some critical soft-skills for data scientists necessary to make it big in the data science world. [box type="shadow" align="" class="" width=""] One chilly evening, Tommy is sitting with his dad on their grassy backyard with the radio on, humming along to their favourite tunes. Tommy, gazing up at the sky for a while, asks his dad, “Dad, what are clouds made of?” Dad takes a sip of beer and replies, “Mostly servers, son. And tonnes of data.” Still gazing up, Tommy takes a deep breath, pondering about what his dad just said. Tommy: Tell me something, what’s the most important thing you’ve learned in your career as a Data Scientist? Dad smiles: I’m glad you asked, son. I’m going to share something important with you. Something I have learned over all these years crunching and munching data. I want you to keep this to yourself and remember it for as long as you can, okay? Tommy: Yes dad. Dad: Atta boy! Okay, the first thing you gotta do if you want to be successful, is you gotta be curious! Data is everywhere and it can tell you a lot. But if you’re not curious to explore data and tackle it from every angle, you will remain mediocre at best. Have an open mind - look at things through a kaleidoscope and challenge assumptions and presumptions. Innovation is the key to making the cut as a data scientist. Tommy nods his head approvingly. Dad, satisfied that Tommy is following along, continues. Dad: One of the most important skills a data scientist should possess is a great business acumen. Now, I know you must be wondering why one would need business acumen when all they’re doing is gathering a heap of data and making sense of it. Tommy looks straight-faced at his dad. Dad: Well, a data scientist needs to know the business like the back of their hand because unless they do, they won’t understand what the business’ strengths and weaknesses are and how data can contribute towards boosting its success. They need to understand where the business fits into the industry and what it needs to do to remain competitive. Dad’s last statement is rewarded by an energetic, affirmative nod from Tommy. Smiling, dad’s quite pleased with the response. Dad: Communication is next on the list. Without a clever tongue, a data scientist will find himself going nowhere in the tech world. Gone are the days when technical knowledge was all that was needed to sustain. A data scientist’s job is to help a business make critical, data-driven decisions. Of what use is it to the non-technical marketing or sales teams, if the data scientist can’t communicate his/her insights in a clear and effective way? A data scientist must also be a good listener to truly understand what the problem is to come up with the right solution. Tommy leans back in his chair, looking up at the sky again, thinking how he would communicate insights effectively. Dad continues: Very closely associated with communication, is the ability to present well, or as a data scientist would put it - tell tales that inspire action. Now a data scientist might have to put forward their findings before an entire board of directors, who will be extremely eager to know why they need to take a particular decision and how it will benefit the organization. Here, clear articulation, a knack for storytelling and strong convincing skills are all important for the data scientist to get the message across in the best way. Tommy quips: Like the way you convince mom to do the dishes every evening? Dad playfully punches Tommy: Hahaha, you little rascal! Tommy: Are there any more skills a data scientist needs to possess to excel at what they do? Dad: Indeed, there are! True data science is a research activity, where problems with unclear or unobvious solutions get solved. There are times when even the nature of the problem isn’t clear. A data scientist should be skilled at performing their own independent research - snooping around for information or data, gathering it and preparing it for further analysis. Many organisations look for people with strong research capabilities, before they recruit them. Tommy: What about you? Would you recruit someone without a research background? Dad: Well, personally no. But that doesn’t mean I would only hire someone if they were a PhD. Even an MSc would do, if they were able to justify their research project, and convince me that they’re capable of performing independent research. I wouldn’t hesitate to take them on board. Here’s where I want to share one of the most important skills I’ve learned in all my years. Any guesses on what it might be? Tommy: Hiring? Dad: Ummmmm… I’ll give this one to you ‘cos it’s pretty close. The actual answer is, of course, a much broader term - ‘management’. It encompasses everything from hiring the right candidates for your team to practically doing everything that a person handling a team does. Tommy: And what’s that? Dad: Well, as a senior data scientist, one would be expected to handle a team of lesser experienced data scientists, managing, mentoring and helping them achieve their goals. It’s a very important skill to hone, as you climb up the ladder. Some learn it through experience, others learn it by taking management courses. Either way, this skill is important for one to succeed in a senior role. And, that’s about all I have for now. I hope at least some of this benefits you, as you step into your first job tomorrow. Tommy smiles: Yeah dad, it’s great to have someone in the same line of work to look up to when I’m just starting out my career. I’m glad we had this conversation. Holding up an empty can, he says, “I’m out, toss me another beer, please.”[/box] Soft Skills for Data Scientists - A quick Recap In addition to keeping yourself technically relevant, to succeed as a data scientist you need to Be curious: Explore data from different angles, question the granted - assumptions & presumptions. Have strong business acumen: Know your customer, know your business, know your market. Communicate effectively: Speak the language of your audience, listen carefully to understand the problem you want to solve. Master the art of presenting well: Tell stories that inspire action, get your message across through a combination of data storytelling, negotiation and persuasion skills Be a problem solver: Do your independent research, get your hands dirty and dive deep for answers. Develop your management capabilities: Manage, mentor and help other data scientists reach their full potential.
Read more
  • 0
  • 0
  • 25320

article-image-9th-nov-17-headlines
Packt Editorial Staff
09 Nov 2017
3 min read
Save for later

9th Nov.' 17 - Headlines

Packt Editorial Staff
09 Nov 2017
3 min read
Bitcoin prices soar and tumble, MongoDB announces its biggest release, and a proposed Grid to improve blockchain system, in today’s top stories in data science news. Bitcoin's roller-coaster amid SegWit2x cancellation Bitcoin price surges to record high, then tanks, as plans to split digital currency is called off Bitcoin was scheduled to upgrade around Nov. 16 following a proposal called SegWit2x, which would have split the digital currency in two. But with major bitcoin developers dropping their support for the upgrade recently, developers behind SegWit2x called off the upgrade plans on Wednesday. In response to this, bitcoin price reached an all-time high around $7,900. However, this was followed by a $1,000 crash, plummeting the price to $6,977. Experts believe the rapid price swing could denote a possible conflict between the short- and long-term impacts of SegWit2x cancellation. The hard fork would have split Bitcoin into two competing blockchains, resulting in an ugly fight for supremacy. Announcing MongoDB 3.6 MongoDB 3.6 released: Change Streams, Retryable Writes among key updates in MongoDB’s biggest ever release MongoDB has announced its biggest release yet, version 3.6, with over a hundred new and updated features. With new array update operators, users can now specify in-place updates to specific array items at any depth of nesting. Extensions to the $lookup aggregation stage now allow uncorrelated subqueries and multiple matching conditions, so referencing and joining documents in complex combinations can be handled in the database. Also, MongoDB 3.6 introduces Change Streams, which applications can use to get real-time notification of updates to collection data. To handle network outages gracefully, MongoDB 3.6 uses Retryable Writes, a new feature ensuring that writes are performed exactly once, even in the face of outages. Besides, MongoDB 3.6 improves on its previous capabilities with the introduction of JSON Schema. “With MongoDB 3.6, schema isn’t a straightjacket, it’s framework of validation you can tune to exactly the degree you need,” Co-Founder Eliot Horowitz said in the official announcement. A new 'Grid' blockchain system Introducing Grid: A scalable Blockchain system for better performance, resource segregation and working governance model A new blockchain initiative Grid proposes to establish a blockchain system which functions as an operating system similar to Linux. As per the modus operandi, Grid will run nodes on clusters. It will allow assigned transactions to different groups based on mutex of the transactions. Transactions within a group will be processed in linear sequence, while all groups will be processed simultaneously. Grid adopts a Main Chain + N Side Chains architecture, which means each business scenario has its dedicated Side Chain to fulfill its requirements. By segregating resources like this, processing efficiency of the system is increased and there is no congestion. Grid also promises a better governance model by permitting Side Chains to join or exit from Main Chain dynamically based on stakeholder voting, therefore introducing competition and incentive to improve each Side Chain. Singapore-based Grid Foundation is promoting Grid’s development and applications, while technical developments will be led by Beijing Hoopox Information and Technology Co. Ltd.
Read more
  • 0
  • 0
  • 1472

article-image-trending-datascience-news-8th-nov-17-headlines
Packt Editorial Staff
08 Nov 2017
3 min read
Save for later

8th Nov.' 17 - Headlines

Packt Editorial Staff
08 Nov 2017
3 min read
spaCy's latest version, Microsoft’s artificial intelligence processor and a proposed AI broker, among today’s tech stories in data science news. Announcing spaCy 2.0 spaCy 2.0 released with 13 new neural network models for 7+ languages The 2.0 version of spaCy has been released, making it up to date with the latest deep learning technologies, with over 60 bug fixes that include several long-standing issues. It is now easier to run spaCy in scalable cloud computing workflows. The spaCy v2.0 comes with 13 new convolutional neural network models for 7+ languages, adding alpha tokenization support for 8 new languages. These models have been designed and implemented from scratch specifically for spaCy, the developer team said, adding that they “re-wrote almost all of the usage guides, API docs and code examples.” For a full overview of changes in v2.0, users can see the guide on migrating from spaCy 1.x. Microsoft goes full throttle on AI chip Microsoft says it will extend Hololense AI processor to other devices from daily life In July, Microsoft had revealed it was designing a custom AI chip for its next-generation Hololense headsets. In latest developments, the company’s Corporate Vice President Panos Panay has said while the work on the proposed artificial intelligence processor is going at full speed, the AI chip may well be implemented in other everyday devices other than Hololense, such as mobile phones, TV’s, wearables, home smart devices and computers. Panay said in an interview that Microsoft is not designing the processor just for its own products, but for devices from all other brands. The AI processor will analyze what the users see and hear on the spot without having to waste precious time to send the data to the cloud for analysis. Other News Cloud SQL for PostgreSQL integrates high availability and replication Cloud SQL for PostgreSQL has now added support for high availability (HA) and read replicas. This can ensure that users’ database workloads are fault tolerant. Announcing the release, developers said the beta release of high availability provides isolation from failures, and read replicas provide additional read performance — requirements for demanding workloads. Artificial Intelligence creeps into CryptoTrading, AiX claims to develop first AI broker Startup AiX has announced the creation of an electronic broker with artificial intelligence. AiX said its AI broker blends cutting-edge artificial intelligence with blockchain technology to make trading cheaper, faster, and trustworthy. Using an AI chatbot and Alexa-style voice recognition, it will execute trades on behalf of individual traders and investment banks that may cut down on trade costs altogether. As all actions will be recorded in blockchain, AiX believes the process will bring reliability and transparency. After securing $16 million already on this project, AiX plans to raise further capital using a token sale before year end.
Read more
  • 0
  • 0
  • 1339

article-image-data-scientist-sexiest-role-21st-century
Aarthi Kumaraswamy
08 Nov 2017
6 min read
Save for later

Data Scientist: The sexiest role of the 21st century

Aarthi Kumaraswamy
08 Nov 2017
6 min read
"Information is the oil of the 21st century, and analytics is the combustion engine." -Peter Sondergaard, Gartner Research By 2018, it is estimated that companies will spend $114 billion on big data-related projects, an increase of roughly 300%, compared to 2013 (https://www.capgemini-consulting.com/resource-file-access/resource/pdf/big_dat a_pov_03-02-15.pdf). Much of this increase in expenditure is due to how much data is being created and how we are better able to store such data by leveraging distributed filesystems such as Hadoop. However, collecting the data is only half the battle; the other half involves data extraction, transformation, and loading into a computation system, which leverages the power of modern computers to apply various mathematical methods in order to learn more about data and patterns and extract useful information to make relevant decisions. The entire data workflow has been boosted in the last few years by not only increasing the computation power and providing easily accessible and scalable cloud services (for example, Amazon AWS, Microsoft Azure, and Heroku) but also by a number of tools and libraries that help to easily manage, control, and scale infrastructure and build applications. Such a growth in the computation power also helps to process larger amounts of data and to apply algorithms that were impossible to apply earlier. Finally, various computation- expensive statistical or machine learning algorithms have started to help extract nuggets of information from data. Finding a uniform definition of data science is akin to tasting wine and comparing flavor profiles among friends—everyone has their own definition and no one description is more accurate than the other. At its core, however, data science is the art of asking intelligent questions about data and receiving intelligent answers that matter to key stakeholders. Unfortunately, the opposite also holds true—ask lousy questions of the data and get lousy answers! Therefore, careful formulation of the question is the key for extracting valuable insights from your data. For this reason, companies are now hiring data scientists to help formulate and ask these questions. At first, it's easy to paint a stereotypical picture of what a typical data scientist looks like: t- shirt, sweatpants, thick-rimmed glasses, and debugging a chunk of code in IntelliJ... you get the idea. Aesthetics aside, what are some of the traits of a data scientist? One of our favorite posters describing this role is shown here in the following diagram: Math, statistics, and general knowledge of computer science is given, but one pitfall that we see among practitioners has to do with understanding the business problem, which goes back to asking intelligent questions of the data. It cannot be emphasized enough: asking more intelligent questions of the data is a function of the data scientist's understanding of the business problem and the limitations of the data; without this fundamental understanding, even the most intelligent algorithm would be unable to come to solid conclusions based on a wobbly foundation. A day in the life of a data scientist This will probably come as a shock to some of you—being a data scientist is more than reading academic papers, researching new tools, and model building until the wee hours of the morning, fueled on espresso; in fact, this is only a small percentage of the time that a data scientist gets to truly play (the espresso part however is 100% true for everyone)! Most part of the day, however, is spent in meetings, gaining a better understanding of the business problem(s), crunching the data to learn its limitations (take heart, this book will expose you to a ton of different feature engineering or feature extractions tasks), and how best to present the findings to non data-sciencey people. This is where the true sausage making process takes place, and the best data scientists are the ones who relish in this process because they are gaining more understanding of the requirements and benchmarks for success. In fact, we could literally write a whole new book describing this process from top-to-tail! So, what (and who) is involved in asking questions about data? Sometimes, it is process of saving data into a relational database and running SQL queries to find insights into data: "for the millions of users that bought this particular product, what are the top 3 OTHER products also bought?" Other times, the question is more complex, such as, "Given the review of a movie, is this a positive or negative review?" This book is mainly focused on complex questions, like the latter. Answering these types of questions is where businesses really get the most impact from their big data projects and is also where we see a proliferation of emerging technologies that look to make this Q and A system easier, with more functionality. Some of the most popular, open source frameworks that look to help answer data questions include R, Python, Julia, and Octave, all of which perform reasonably well with small (X < 100 GB) datasets. At this point, it's worth stopping and pointing out a clear distinction between big versus small data. Our general rule of thumb in the office goes as follows: If you can open your dataset using Excel, you are working with small data. Working with big data What happens when the dataset in question is so vast that it cannot fit into the memory of a single computer and must be distributed across a number of nodes in a large computing cluster? Can't we just rewrite some R code, for example, and extend it to account for more than a single-node computation? If only things were that simple! There are many reasons why the scaling of algorithms to more machines is difficult. Imagine a simple example of a file containing a list of names: B D X A D A We would like to compute the number of occurrences of individual words in the file. If the file fits into a single machine, you can easily compute the number of occurrences by using a combination of the Unix tools, sort and uniq: bash> sort file | uniq -c The output is as shown ahead: 2 A 1 B 1 D 1 X However, if the file is huge and distributed over multiple machines, it is necessary to adopt a slightly different computation strategy. For example, compute the number of occurrences of individual words for every part of the file that fits into the memory and merge the results together. Hence, even simple tasks, such as counting the occurrences of names, in a distributed environment can become more complicated. The above is an excerpt from the book  Mastering Machine Learning with Spark 2.x by Alex Tellez, Max Pumperla and Michal Malohlava. If you would like to learn how to solve the above problem and other cool machine learning tasks a data scientist carries out such as the following, check out the book. Use Spark streams to cluster tweets online Run the PageRank algorithm to compute user influence Perform complex manipulation of DataFrames using Spark Define Spark pipelines to compose individual data transformations Utilize generated models for off-line/on-line prediction
Read more
  • 0
  • 0
  • 26776

article-image-uber-deep-probabilistic-programming-language-pyro
Abhishek Jha
08 Nov 2017
3 min read
Save for later

Introducing "Pyro" for deep probabilistic modeling

Abhishek Jha
08 Nov 2017
3 min read
Last year when Uber set up its ambitious facility in San Francisco as Uber AI Labs, the aim was to leverage cutting-edge research in artificial intelligence and machine learning to move people and things in the real world — a challenge that is more complex and uncertain than it appears on paper. It extends to, as the firm admitted, teaching a self-driven machine to safely and autonomously navigate the world, whether a car on the roads or an aircraft through busy airspace or new types of robotic devices. Well, the first big initiative to come out of the Labs is Pyro, a deep universal probabilistic programming language. “Pyro is a tool for deep probabilistic modeling, unifying the best of modern deep learning and Bayesian modeling,” writes Stanford researcher Noah Goodman, a member of Uber AI Labs. Written in Python, the Pyro programming language supports PyTorch in the backend. Among the key principles underlying Pyro’s design, it is a flexible and scalable programming library, implemented with a small core of powerful, composable abstractions. Pyro: Design principles and insights In Uber’s own words, Pyro was developed to satisfy the following four design principles: Universal: Pyro is a universal PPL—it can represent any computable probability distribution. How? By starting from a universal language with iteration and recursion (arbitrary Python code), and then adding random sampling, observation, and inference. Scalable: Pyro scales to large data sets with little overhead above hand-written code. How? By building modern black box optimization techniques, which use mini-batches of data, to approximate inference. Minimal: Pyro is agile and maintainable. How? Pyro is implemented with a small core of powerful, composable abstractions. Wherever possible, the heavy lifting is delegated to PyTorch and other libraries. Flexible: Pyro aims for automation when you want it and control when you need it. How? Pyro uses high-level abstractions to express generative and inference models, while allowing experts to easily customize inference. In a way, Pyro is going to reflect on interesting aspects in PPL research, starting from dynamic computational graphs to deep generative models, or even programmable inference. “In Pyro, both the generative models and the inference guides can include deep neural networks as components,” Goodman wrote. “The resulting deep probabilistic models have shown great promise in recent work, especially for unsupervised and semi-supervised machine learning problems.” Pyro: Installation (Remember to first install PyTorch) Install via pip: Python 2.7.*: pip install pyro-ppl Python 3.5: pip3 install pyro-ppl Install from source: git clone git@github.com:uber/pyro.git cd pyro pip install . Still in alpha release, Pyro may see several possible enhancements in coming days with more and more engagement from probabilistic programming and deep learning communities.
Read more
  • 0
  • 0
  • 2806

article-image-dr-brandon-explains-decision-trees-jon
Aarthi Kumaraswamy
08 Nov 2017
3 min read
Save for later

Dr.Brandon explains Decision Trees to Jon

Aarthi Kumaraswamy
08 Nov 2017
3 min read
[box type="shadow" align="" class="" width=""]Dr. Brandon: Hello and welcome to the third episode of 'Date with Data Science'. Today we talk about decision trees in machine learning. Jon: Decisions are hard enough to make. Now you want me to grow a decision tree. Next, you'll say there are decision jungles too! Dr. Brandon: It might come as a surprise to you, Jon, but decision trees can help you make decisions easier. Imagine you are in a restaurant and you are given a menu card. A decision tree can help you decide if you want to have a burger, pizza, fries or a pie, for instance. And yes, there are decision jungles, but they are called random forests. We will talk about them another time. Jon: You know Bran, I have never been very good at making decisions. But with food, it is easy. It's ALWAYS all you can have. Dr. Brandon: Well, my mistake. Let's take another example. You go to the doctor's after your binge eating at the restaurant with stomach complaints. A decision tree can help your doctor decide if you have a problem and then to choose a treatment option based on what your symptoms are. Jon: Really!? Tell me more. Dr. Brandon: Alright. The following excerpt introduces decision trees from the book Apache Spark 2.x Machine Learning Cookbook by Siamak Amirghodsi, Meenakshi Rajendran, Broderick Hall, and Shuen Mei. To know how to implement them in Spark read this article. [/box] Decision trees are one of the oldest and more widely used methods of machine learning in commerce. What makes them popular is not only their ability to deal with more complex partitioning and segmentation (they are more flexible than linear models) but also their ability to explain how we arrived at a solution and as to "why" the outcome is predicated or classified as a class/label. A quick way to think about the decision tree algorithm is as a smart partitioning algorithm that tries to minimize a loss function (for example, L2 or least square) as it partitions the ranges to come up with a segmented space which are best-fitted decision boundaries to the data. The algorithm gets more sophisticated through the application of sampling the data and trying a combination of features to assemble a more complex ensemble model in which each learner (partial sample or feature combination) gets to vote toward the final outcome. The following figure depicts a simplified version in which a simple binary tree (stumping) is trained to classify the data into segments belonging to two different colors (for example, healthy patient/sick patient). The figure depicts a simple algorithm that just breaks the x/y feature space to one-half every time it establishes a decision boundary (hence classifying) while minimizing the number of errors (for example, a L2 least square measure): The following figure provides a corresponding tree so we can visualize the algorithm (in this case, a simple divide and conquer) against the proposed segmentation space. What makes decision tree algorithms popular is their ability to show their classification result in a language that can easily be communicated to a business user without much math: If you liked the above excerpt, please be sure to check out Apache Spark 2.0 Machine Learning Cookbook it is originally from to learn how to implement deep learning using Spark and many more useful techniques on implementing machine learning solutions with the MLlib library in Apache Spark 2.0.
Read more
  • 0
  • 0
  • 25692
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-4-clustering-algorithms-every-data-scientist-know
Sugandha Lahoti
07 Nov 2017
6 min read
Save for later

4 Clustering Algorithms every Data Scientist should know

Sugandha Lahoti
07 Nov 2017
6 min read
[box type="note" align="" class="" width=""]This is an excerpt from a book by John R. Hubbard titled Java Data Analysis. In this article, we see the four popular clustering algorithms: hierarchical clustering, k-means clustering, k-medoids clustering, and the affinity propagation algorithms along with their pseudo-codes.[/box] A clustering algorithm is one that identifies groups of data points according to their proximity to each other. These algorithms are similar to classification algorithms in that they also partition a dataset into subsets of similar points. But, in classification, we already have data whose classes have been identified. such as sweet fruit. In clustering, we seek to discover the unknown groups themselves. Hierarchical clustering Of the several clustering algorithms that we will examine in this article, hierarchical clustering is probably the simplest. The trade-off is that it works well only with small datasets in Euclidean space. The general setup is that we have a dataset S of m points in Rn which we want to partition into a given number k of clusters C1 , C2 ,..., Ck , where within each cluster the points are relatively close together. Here is the algorithm: Create a singleton cluster for each of the m data points. Repeat m – k times: Find the two clusters whose centroids are closest Replace those two clusters with a new cluster that contains their points The centroid of a cluster is the point whose coordinates are the averages of the corresponding coordinates of the cluster points. For example, the centroid of the cluster C = {(2, 4), (3, 5), (6, 6), (9, 1)} is the point (5, 4), because (2 + 3 + 6 + 9)/4 = 5 and (4 + 5 + 6 + 1)/4 = 4. This is illustrated in the figure below : K-means clustering A popular alternative to hierarchical clustering is the K-means algorithm. It is related to the K-Nearest Neighbor (KNN) classification algorithm. As with hierarchical clustering, the K-means clustering algorithm requires the number of clusters, k, as input. (This version is also called the K-Means++ algorithm) Here is the algorithm: Select k points from the dataset. Create k clusters, each with one of the initial points as its centroid. For each dataset point x that is not already a centroid: Find the centroid y that is closest to x Add x to that centroid’s cluster Re-compute the centroid for that cluster It also requires k points, one for each cluster, to initialize the algorithm. These initial points can be selected at random, or by some a priori method. One approach is to run hierarchical clustering on a small sample taken from the given dataset and then pick the centroids of those resulting clusters. K-medoids clustering The k-medoids clustering algorithm is similar to the k-means algorithm, except that each cluster center, called its medoid, is one of the data points instead of being the mean of its points. The idea is to minimize the average distances from the medoids to points in their clusters. The Manhattan metric is usually used for these distances. Since those averages will be minimal if and only if the distances are, the algorithm is reduced to minimizing the sum of all distances from the points to their medoids. This sum is called the cost of the configuration. Here is the algorithm: Select k points from the dataset to be medoids. Assign each data point to its closest medoid. This defines the k clusters. For each cluster Cj : Compute the sum  s = ∑ j s j , where each sj = ∑{ d (x, yj) : x ∈ Cj } , and change the medoid yj  to whatever point in the cluster Cj that minimizes s If the medoid yj  was changed, re-assign each x to the cluster whose medoid is closest Repeat step 3 until s is minimal. This is illustrated by the simple example in Figure 8.16. It shows 10 data points in 2 clusters. The two medoids are shown as filled points. In the initial configuration it is: C1 = {(1,1),(2,1),(3,2) (4,2),(2,3)}, with y1 = x1 = (1,1) C2 = {(4,3),(5,3),(2,4) (4,4),(3,5)}, with y2 = x10 = (3,5) The sums are s1 = d (x2,y1) + d (x3,y1) + d (x4,y1) + d (x5,y1) = 1 + 3 + 4 + 3 = 11 s2 = d (x6,y1) + d (x7,y1) + d (x8,y1) + d (x9,y1) = 3 + 4 + 2 + 2 = 11 s = s1 + s2  = 11 + 11 = 22 The algorithm at step 3 first part changes the medoid for C1 to y1 = x3 = (3,2). This causes the clusters to change, at step 3 second part, to: C1 = {(1,1),(2,1),(3,2) (4,2),(2,3),(4,3),(5,3)}, with y1 = x3 = (3,2) C2 = {(2,4),(4,4),(3,5)}, with y2 = x10 = (3,5) This makes the sums: s1 = 3 + 2 + 1 + 2 + 2 + 3 = 13 s2 = 2 + 2 = 4 s = s1 + s2  = 13 + 4 = 17 The resulting configuration is shown in the second panel of the figure below: At step 3 of the algorithm, the process repeats for cluster C2. The resulting configuration is shown in the third panel of the above figure. The computations are: C1 = {(1,1),(2,1),(3,2) (4,2),(4,3),(5,3)}, with y1 = x3 = (3,2) C2 = {(2,3),(2,4),(4,4),(3,5)}, with y2 = x8 = (2,4) s = s1 + s2  = (3 + 2 + 1 + 2 + 3) + (1 + 2 + 2) = 11 + 5 = 16 The algorithm continues with two more changes, finally converging to the minimal configuration shown in the fifth panel of the above figure. This version of k-medoid clustering is also called partitioning around medoids (PAM). Affinity propagation clustering One disadvantage of each of the clustering algorithms previously presented (hierarchical, k-means, k-medoids) is the requirement that the number of clusters k be determined in advance. The affinity propagation clustering algorithm does not have that requirement. Developed in 2007 by Brendan J. Frey and Delbert Dueck at the University of Toronto, it has become one of the most widely-used clustering methods. Like k-medoid clustering, affinity propagation selects cluster center points, called exemplars, from the dataset to represent the clusters. This is done by message-passing between the data points. The algorithm works with three two-dimensional arrays: sij = the similarity between xi and xj rik = responsibility: message from xi to xk on how well-suited xk is as an exemplar for xi aik = availability: message from xk to xi on how well-suited xk is as an exemplar for xi Here is the complete algorithm: Initialize the similarities: sij = –d(xi , xj )2 , for i ≠ j; sii = the average of those other sij values 2. Repeat until convergence: Update the responsibilities: rik = sik − max {aij + s ij  : j ≠ k} Update the availabilities: aik = min {0, rkk + ∑j  { max {0, rjk } : j ≠ i ∧ j ≠ k }}, for i ≠ k; akk = ∑j  { max {0, rjk } : j ≠ k } A point xk will be an exemplar for a point xi if aik + rik = maxj {aij + rij}. If you enjoyed this excerpt from the book Java Data Analysis by John R. Hubbard, check out the book to learn how to implement various machine learning algorithms, data visualization and more in Java.
Read more
  • 0
  • 0
  • 22097

article-image-salesforce-myeinstein
Abhishek Jha
07 Nov 2017
3 min read
Save for later

Salesforce myEinstein: Now build AI apps with 'clicks, not code'

Abhishek Jha
07 Nov 2017
3 min read
This year’s Dreamforce conference has started rather big. The Einstein machine learning platform has been updated with new predictive insights and chatbot capabilities. In ways that could truly make AI and deep learning more accessible to developers. The latest iteration, Salesforce myEinstein, allows users of all skill levels to now develop custom AI apps “with clicks, without being a data scientist.” The tool has two new services: Einstein Prediction Builder and Einstein Bots. Einstein Prediction Builder enables automatic creation of custom AI models that can forecast outcomes for any field or object in Salesforce. Whereas, with Einstein Bots developers and admins can use a point-and-click interface to build custom chatbots. It is a service which can be trained to augment customer service workflows by automating tasks such as answering questions and retrieving information. "We are further democratizing AI by empowering admins and developers to transform every process and customer interaction to be more intelligent with myEinstein," Salesforce GM and SVP John Ball said. "No other company is arming customers with both pre-built AI apps for CRM and the ability to build and customise their own with just clicks." As far as business processes are concerned, it’s high time that employees are freed from one-size-fits-all tools, and more importantly, the repetitive tasks that take up their days. But to this date, companies have always been hindered by the infrastructure costs, lack of expertise and the resources required to optimize their workflow with AI. This is where Salesforce myEinstein is a remarkable announcement. With myEinstein, the employees who are actually managing and driving business processes have the power to build and customize AI apps to fit their specific needs, paving the way for everyone to be smarter and more productive in the process. So how does myEinstein works with ‘simple clicks’ after all? The declarative setup guide walks users through building, training and deploying AI models using structured and unstructured Salesforce data. The service automates the model building and data scoring process and custom predictive models and bots can then be easily embedded directly into Salesforce workflows. Models and bots automatically learn and improve as they're used, delivering accurate, personalized recommendations and predictions in the context of business. Both the tools, Einstein Prediction Builder as well as Einstein Bots, are currently in pilot and will be generally available in summer of 2018. Salesforce said pricing for each Einstein feature varies as some are already covered under the existing license while others require additional charges. It’s to be seen as to what extent Salesforce manages to reduce the complexity with creating bots and bring an element of underlying intelligence, but as the firm’s vice president Jim Sinai said, myEinstein is "automating data science under the hood."
Read more
  • 0
  • 0
  • 1689

article-image-trending-datascience-news-7th-nov-17-headlines
Packt Editorial Staff
07 Nov 2017
5 min read
Save for later

7th Nov.' 17 - Headlines

Packt Editorial Staff
07 Nov 2017
5 min read
Google’s Tangent, Salesforce’s myEinstein, Intel-AMD partnership, and HPE’s Superdome Flex among today’s top stories in data science news. Announcing Python library Tangent Google introduces Tangent, a Python library for automatic differentiation Google has announced a new, open-source Python library for automatic differentiation called Tangent. In contrast to existing machine learning libraries, Tangent is a source-to-source system, consuming a Python function f and emitting a new Python function that computes the gradient of f. This allows much better user visibility into gradient computations, as well as easy user-level editing and debugging of gradients. Tangent is useful to researchers and students who not only want to write their models in Python, but also read and debug automatically-generated derivative code without sacrificing speed and flexibility. Salesforce in news Salesforce announces machine learning platform myEinstein to build custom AI apps Salesforce has unveiled a machine learning platform myEinstein at its annual Dreamforce conference on Monday. The myEinstein platform enables users to develop custom AI apps "with clicks, without being a data scientist." The tool has two new services: Einstein Prediction Builder and Einstein Bots. Einstein Prediction Builder enables automatic creation of custom AI models that can predict outcomes for any field or object in Salesforce. Whereas Einstein Bots is a service which can be trained to augment customer service workflows by automating tasks such as answering questions and retrieving information. Salesforce, Google form strategic partnership on cloud Salesforce and Google have entered into a cloud partnership that could provide easier integration between Salesforce tools and Google’s G Suite and Google Analytics. Salesforce plans to use Google Cloud Platform (GCP) for its core services as part of its international infrastructure expansion. Intel-AMD partnership to target Nvidia Intel teams up with AMD for semi-custom GPU for next-gen mobile chips In a bid to counter rival Nvidia, Intel has joined hands with AMD to create a next-generation notebook chip. Intel said the new chips will be part of its 8th-generation Core H mobile processors, and will not only feature a discrete-level graphics cards, but also have built-in High Bandwidth Memory (HBM2) RAM packed onto a single board. While more information will be available in future, the first machines with the new technology will be released in the first quarter of 2018. New analytics platforms announced Rockwell unveils Project Scio, a scalable analytics platform for industrial IoT applications Rockwell Automation has announced Project Scio, a scalable and open platform that gives users secure, persona-based access to all data sources, structured or unstructured. The company said that Scio offers a configurable, easy-to-use interface with which “all users can become self-serving data scientists to solve problems and drive tangible business outcomes.” It can also intelligently fuse related data, delivering analytics in intuitive dashboards – called storyboards – that users can share and view. “Providing analytics at all levels of the enterprise – on the edge, on-premises or in the cloud – helps users have the ability to gain insights not possible before,” said John Genovesi, vice president of Information Software, Rockwell Automation. “When users gain the ability to fuse multiple data sources and add machine learning, their systems could become more predictive and intelligent.” HPE launches Superdome Flex platform for high performance data analytics for mission critical workloads Hewlett Packard Enterprise (HPE) has unveiled HPE Superdome Flex, a highly scalable and modular in-memory computing platform. The platform enables enterprises of any size to process and analyze massive amounts of data and turn it into real-time business insights. “With HPE Superdome Flex, customers can capitalize on in-memory data analytics for their most critical workloads and scale seamlessly as data volumes grow,” said Randy Meyer, vice president at HPE. Other news in data science Google releases its internal tool Colaboratory Google has released yet another internal development tool in Colaboratory. Built on top of the open-source Jupyter project, Colaboratory is both an education tool as well as one meant for collaboration for research. With Colaboratory, users create notebooks, or documents, that can be simultaneously edited like Google Docs, but with an added ability to run code and show that code’s output within the document. It supports Python 2.7 and has to be used on Google Chrome. The software is also integrated with Google Drive. Neuromation announces ICO to facilitate AI adoption with blockchain-powered platform Neuromation is utilizing blockchain technology to create a marketplace, the Neuromation Platform, which will connect multiple parties and bridge the gap between research, design and implementation stages of AI modeling in a cost-effective manner. In this connection, Neuromation ICO is in its pre-sale stage, which will end with the public sale, starting on Nov. 28 and ending on Jan. 1, 2018. Out of the total of 100,000,000 Neurotokens, 60,000,000 be available for distribution, with each token priced at 0.001 ETH. According to the project roadmap, second version of Neuromation Platform will be launched in Q2 2018, and then v3 will be launched with a custom blockchain in Q3 2018. DefinedCrowd unveils data platform API at Web Summit 2017 Seattle-based startup DefinedCrowd Corp. announced the release of version 1.0 of their public API at Web Summit 2017 in Lisbon. The product, which will be generally available on November 8, helps companies create new projects, upload tasks, and execute data collections and data processing campaigns in a more streamlined way, directly from their own data and machine learning infrastructure. “The life of data scientists will become easier with this API,” said CEO and Founder Daniela Braga. “They will have the option to integrate their data platforms with DefinedCrowd, having complete control of their projects, working from their own platforms. This will give them direct access to high-quality large-scale data with very little overhead.”
Read more
  • 0
  • 0
  • 1085

article-image-trending-datascience-news-6th-nov-17-headlines
Packt Editorial Staff
06 Nov 2017
4 min read
Save for later

6th Nov.' 17 - Headlines

Packt Editorial Staff
06 Nov 2017
4 min read
Uber’s new programming language Pyro, Tableau's new integration with AWS analytics, IBM’s cloud restructuring, and more in today’s tech stories on data science news. Introducing Pyro for deep probabilistic modeling Uber AI Labs announces PyTorch-based deep universal probabilistic programming language Pyro As the first public project to come out of Uber AI Labs, Uber has released a programming language called “Pyro” that will help developers build probabilistic models for AI research. “Pyro is a tool for deep probabilistic modeling, unifying the best of modern deep learning and Bayesian modeling,” wrote Noah Goodman, Stanford researcher and member of Uber AI Labs, in a blog post. Pyro is based on Python and the PyTorch library. “In Pyro, both the generative models and the inference guides can include deep neural networks as components,” Goodman added. “The resulting deep probabilistic models have shown great promise in recent work, especially for unsupervised and semi-supervised machine learning problems.” Uber added that Pyro is an alpha release. Tableau in data science news Tableau announces support for Amazon Redshift Spectrum in Tableau 10.4 Tableau has announced an update to its Amazon Redshift connector with support for Amazon Redshift Spectrum (external S3 tables). This feature was released as part of Tableau 10.3.3 and will be available broadly in Tableau 10.4.1. In an official blog post, Tableau said its customers can now connect directly to data in Amazon Redshift and analyze it in conjunction with data in Amazon Simple Storage Service (S3). IBM revamps its Cloud strategy IBM brings new cloud data tools, updates Unified Data Governance Platform Presenting its latest vision for the cloud, IBM has announced a set of new data management products. The company added new tools Data Catalog, Data Refinery, and Analytics Engine on its Watson Data Platform. Keeping in mind the European Union’s incoming GDPR (General Data Protection Regulation), IBM also updated its Unified Data Governance Platform to ensure that businesses are better prepared to comply with the new regulation. IBM’s Goodbye to Bluemix brand In yet another rebranding, IBM Bluemix Cloud has been renamed as just “IBM Cloud.” The simplified naming, IBM said, is intended to put more focus on data and data science as against the infrastructure. Last year, the company had expunged the SoftLayer name for Bluemix. Newly launched data platforms Periscope Data unveils new platform to bolster “data driven culture” for professional data teams Targeting the large and growing market for big data and analytics software,Periscope Data has launched a new platform using which professional data teams can address the complete analytics life cycle. The Periscope Unified Data Platform enables teams to ingest, store, analyze, visualize and report on data. Periscope said that Unified Data Platform extends the core product with built-in data warehousing capabilities based on Amazon Redshift, as well as with new capabilities to ingest data from any source. The platform is designed to overcome the high costs of incomplete, obsolete and potentially inaccurate data, which costs huge monetary losses to businesses every year. The Unified Data Platform is comprised of the core components including ingestion from virtually any source, storage through a data warehouse, analysis in seconds, visualizations from data charts instantly, and reporting. Caviar announces real estate-backed digital asset platform Blockchain startup Caviar has launched a dual-purpose token and crowdfunding platform built on the Ethereum blockchain. Caviar’s token offers access to stable real estate and cryptocurrencies, with built-in downside protection and automatic diversification. In addition, Caviar Platform will allow real estate developers to raise funds for their upcoming projects. The pre-sale, being launched on November 28, aims to raise $25 million. SIA: MCN collaborates with SAS to unveil single source data platform Multi Channel Network (MCN) is partnering analytics software company SAS to deliver a new data management solution that integrates all of MCN’s different data sources to provide advertisers with a single consumer view across linear TV and all digital platforms. Known as SIA, the data tool combines MCN's data sources from TV, online, mobile, location and OOH, as well as data from agencies and advertisers. This includes adding data assets from Telstra, Near (MCN's location data). It will act as a central nervous system for all of MCN's data assets and business initiatives, including programmatic TV and addressable advertising as well as new business models around data.
Read more
  • 0
  • 0
  • 1378
article-image-apache-kafka-1-0-streaming-platform
Abhishek Jha
06 Nov 2017
4 min read
Save for later

Apache Kafka 1.0: From messaging system to streaming platform

Abhishek Jha
06 Nov 2017
4 min read
In tech world, when a 1.0 version gets released, it’s assumed that the software is stable, mature, and production ready. But for Neha Narkhede, co-founder of Confluent and co-creator of Apache Kafka, the wait for Apache Kafka 1.0 “was less about stability and more about completeness of the vision” she and a team of engineers set to build towards back when they first started Kafka in 2009. After all, Kafka has been so broadly adopted by thousands of companies for several years, including a third of the Fortune 500 enterprises that continue to trust the platform for their mission-critical applications. Every software has a unique story to tell in their journey towards 1.0. In case of Kafka, named after acclaimed German writer Franz Kafka (Jay Kreps spilled the beans in a 2014 Quora post), it’s more about the transformation from a messaging system to a distributed streaming platform. “Back in 2009 when we first set out to build Kafka, we thought there should be an infrastructure platform for streams of data. We didn’t start with the idea of making our own software, but started by observing the gaps in the technologies available at the time and realized how they were insufficient to serve the needs of a data-driven organization,” says Neha. This is interesting because the team was not imagining some hypothetical need, but a real world business need. Not by building Kafka, but by thinking ‘Why did the stream processing startups fail in the 2000’s and 1990’s?’ “They failed because companies did not have the ability to collect these streams and have them laying around to process,” she adds, “the big question we asked ourselves was ‘why not both scale and real-time’? And more broadly, why not build a true infrastructure platform that allows you to build all of your applications on top of it, and have those applications handle streaming data by default.” And thus followed a multi-stage transformation: implementing a log-like abstraction for continuous streams, making Kafka fault-tolerant and building replication into it, building APIs that made it easy to get data in and out of Kafka and process it, and more recently adding transactions to enable exactly-once semantics for stream processing. The Version 1.0.0 comes with further performance improvements with exactly-once semantics which avoids sending the same messages multiple times in the case of a connection error. The exactly-once capabilities ensure enterprise stream processing in a controlled manner, as  they enable “closure-like functions” for stream processing. Fundamentally, message delivery to an endpoint once, and no more than once, in a distributed stateless systems has been an ongoing challenge. But while guaranteeing exactly-once delivery is still a debated topic, Kafka’s continued enhancements with exactly-once semantics have resulted in its wider acceptance. Besides, Apache Kafka 1.0.0 has several important improvements such as significantly faster TLS and CRC32C implementations with Java 9 support, faster controlled shutdown, and better JBOD support, among other bug fixes. There are other features which essentially got the nod: Kafka can now tolerate disk failures better, there is better diagnostics for simple authentication and security layer (SASL) authentication failures, and the Streams API has been improved with functional enhancements. “The nice thing about all this is that while the current instantiation of Kafka’s Streams APIs are in the form of Java libraries, it isn’t limited to Java per se. Kafka’s support for stream processing is primarily a protocol-level capability that can be represented in any language. This is an important distinction. Stream processing isn’t one interface, so there is no restriction for it to be available as a Java library alone. There are many ways to express continual programs: SQL, function-as-a-service or collection-like DSLs in many programming languages. A foundational protocol is the right way to address this diversity in applications around an infrastructure platform,” said Neha. May be it is this continual improvement that she was talking about as part of Apache Kafka’s decade long completeness of vision, which saw it getting trusted by companies like LinkedIn, Capital One, Goldman Sachs, LinkedIn, Netflix, Pinterest, and New York Times. "Kafka enabled us to process trillions of messages per day in a scalable way. This opened up a completely new frontier for us to efficiently process data in motion to help us better serve Netflix members around the world," said Allen Wang, Senior Software Engineer at Netflix. Apache Kafka 1.0 is more than just a release. As the company rightly puts it, 1.0.0 is not a ‘mere bump of the version number’ but a full-fledged streaming platform with the ability to read, write, move and process streams of data with transactional correctness at enterprise-wide scale. It will, in fact, play a bigger role in future if stream processing goes on to become the “central nervous system” for companies worldwide.
Read more
  • 0
  • 0
  • 1646

article-image-cisco-spark-assistant
Abhishek Jha
04 Nov 2017
3 min read
Save for later

Cisco Spark Assistant: World's first AI voice assistant for meetings

Abhishek Jha
04 Nov 2017
3 min read
Few days back I wrote about how the cloud collaboration with Google could overturn Cisco’s dwindling fortunes. Seems like the internet tech pioneer is now back into full throttle. It has a reason to remind the world what it did with internet, after all. And no prizes in guessing that it intends to do the same with the next generation tech sensation – artificial intelligence. To start with, let’s be honest with the daily corporate meetings – it’s boring. Meetings after meetings – every Monday, every other day, client meetings, internal meetings, vendor meetings – and possible all kinds of stakeholders meetings that are ‘serious’ stuffs. Devoid of smile. Enter Cisco Spark Assistant. As you set up your office meetings, AI takes over with a simple, “Hey, Spark.” Basically bots have entered your meeting rooms. "During the next few years, AI meeting bots will be joining our work teams. When they do, people will be able to ditch the drudgery of meeting setup and other logistics to become more creative than ever," says Cisco SVP and GM Rowan Trollope, "The future of great meetings is Spark with AI and our partners have an incredible opportunity to help customers take advantage of this game-changing technology." Cisco Spark Assistant is the latest in the series of innovations on the Cisco Spark platform. The announcement was made at Cisco Partner Summit, and the company said the world's first enterprise-ready voice assistant for meetings will see a phased rollout. Early next year, it will be available first on the Cisco Spark Room Series portfolio, including the new flagship Cisco Spark Room 70. In May, Cisco had entered a $125 million deal to buy MindMeld. The new service leverages machine learning technology out of that acquisition. So how it is going to bring down the hassles? The Assistant will let you speak commands to Spark-registered devices, and it’s kind of going to be a zero-touch meeting scenario. Just tell the AI bots what you want it do. From ‘Hey, Spark. Let's get started.’ and ‘Hey, Spark. Call Wilson’s meeting room.’ to ‘Hey, Spark. End the meeting.’ All without lifting a finger. Apart from machine learning, speech recognition technology and natural language understanding, Cisco said it has also applied its deep knowledge of meetings, honed over time: “Because we deliver 50 billion minutes of meetings every year. With this, we optimized the AI for the conference room.” Don’t forget since the time you started your first job, you have always seen a Cisco conference phone in every meeting room. In future, Cisco plans to further enhance the service based on the feedback from early trials. The Assistant could become smarter with added capabilities to assign action items and prepare minutes of the meeting. “Spark Assistant takes advantage of our meeting room endpoints' industry-first advancements such as intelligent proximity, speaker tracking and real-time face recognition. These let it see and hear. As a result, Cisco Spark Assistant knows who enters the room, who leaves the room and who is speaking,” the company said in its official announcement. The initial focus looks clearly on simplifying everyday meetings. And voice commands promise to streamline the things. Above all, they definitely induce an ‘interactive’ incentive to drive away your Monday blues.
Read more
  • 0
  • 0
  • 13503

article-image-3rd-nov-17-headlines
Packt Editorial Staff
03 Nov 2017
4 min read
Save for later

3rd Nov.' 17 - Headlines

Packt Editorial Staff
03 Nov 2017
4 min read
Apache Kafka 1.0, IBM Watson upgrades, and Cisco’s first voice assistant for meetings, in today’s trending stories in data science news. New version releases Apache Kafka goes 1.0 Open source distributed streaming platform Apache Kafka has released its version 1.0.0. Apache Kafka 1.0 includes performance improvements with exactly-once semantics, significantly faster TLS and CRC32C implementations with Java 9 support, significantly faster controlled shutdown, and better JBOD support, among other general improvements and bug fixes, according to the official announcement. Apache Kafka is in use at large and small companies worldwide, including Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest, Rabobank, Target, The New York Times, Uber, Yelp, and Zalando, among others. Pentaho 8.0 integrates Spark and Kafka, boosts real-time data processing capabilities Pentaho 8.0, the next generation of Pentaho data integration and analytics platform software, has been unveiled at the PentahoWorld 2017 user conference. The new version comes with a better preparedness for real time data deluge. Pentaho 8.0 fully supports stream data ingestion and processing using its native engine or Spark. It also now enables real-time processing with specialized steps that connect Pentaho Data Integration (PDI) to Kafka. The 8.0 version adds support for the Knox Gateway used for authenticating users to Hadoop services. It is now easier to read and write to popular big data file formats and process with Spark using Pentaho’s visual editing tools. To increase productivity across the data pipeline, Pentaho 8.0 adds new features such as granular filters for preparing data, improved repository usability and easier application auditing. Platform upgrades and enhancements IBM announces set of upgrades to Watson Data Platform IBM has announced several upgrades to its Watson Data Platform, giving data professionals a stronger foundation for AI applications. The new services include Data Catalog, which creates a complete, searchable index of structured and unstructured data in a system; Data Refinery, a tool to prepare, cleanse and process data for AI purposes; and Analytics Engine, an intelligent repository for data that combines Apache Spark and Apache Hadoop, powered by IBM Cloud Object storage. Adobe Analytics enhanced with advanced features for faster analysis, better customer intelligence Adobe has added several advanced features into Adobe Analytics, that provide employees with intelligence curated for their roles throughout the organization. Adobe Analytics will now have Context-Aware Sessions, Audience Analytics, and new Visualizations features – which make it easier for brands to combine dimensions, metrics and date range in any combination with the ability to query billions of rows of data in seconds. Also, there will be improvements to the virtual report suite for mobile teams.These new capabilities will enable increased collaboration, faster analysis and improved customer intelligence, allowing high-growth brands to derive meaningful insights faster, and with more precision. Breakthrough innovations in AI Cisco Spark Assistant: World’s first AI Voice Assistant for Meetings Cisco Spark Assistant is going to be the world’s first enterprise-ready voice assistant for meetings, the company announced at Cisco Partner Summit. Cisco Spark Assistant will be available first on the Cisco Spark Room Series portfolio, including the new flagship Cisco Spark Room 70 , the company said. “During the next few years, AI meeting bots will be joining our work teams. When they do, people will be able to ditch the drudgery of meeting setup and other logistics to become more creative than ever,” said Rowan Trollope, SVP and GM at Cisco. Using machine learning, Factual’s Engine will tell developers when to engage users Location data provider Factual has launched Engine, a mobile software development kit (SDK) using which developers can add location data and intelligence into mobile apps. Engine uses machine learning to help developers know the right time to engage users. The AI considers business operating hours, device usage patterns, speed and direction of travel to determine the specific circumstance of a user. “The bar for smart and intelligent apps is rising exponentially, and developers demand solutions that help them provide personalized, effortless experiences to end users,” said Gil Elbaz, founder and CEO of Factual. “Engine is uniquely able to understand a device's exact location and movement, and using that location intelligence, design customized outcomes for users.” Engine is available for both Android and iOS.
Read more
  • 0
  • 0
  • 1475
article-image-trending-datascience-news-2nd-nov-17-headlines
Packt Editorial Staff
02 Nov 2017
5 min read
Save for later

2nd Nov.' 17 - Headlines

Packt Editorial Staff
02 Nov 2017
5 min read
Keras update, TensorFlow Eager execution, new Blockchain project Thunder token, and more in today’s data science news. Keras 2.0.9 released The latest version of Keras, 2.0.9, has been released on GitHub with several RNN improvements, easier multi-GPU data parallelism, and a range of API changes, in addition to added bug fixes and performance improvements such as the native support for NCHW data layout in TensorFlow. Implementation changes in Keras 2.0.9 result in a different scaling and normalization behavior. Google announces “eager execution” for TensorFlow Google has unveiled a new interface “eager execution” making it easier to get started with TensorFlow. Eager execution is an imperative, define-by-run interface where operations are executed immediately as they are called from Python. Announcing the release on its official blog, Google said the benefits of eager execution interface includes fast debugging, support for dynamic models, support for custom and high-order gradients, and almost all of the available TensorFlow operations. Google is soliciting feedback on this experimental feature. Yellowfin 7.4 released BI platform Yellowfin has announced the release of Yellowfin 7.4. While augmented data discovery is a notable feature in the product, the addition of ETL enables the company to add data science platforms such as H2O, incorporating data science components like Predictive Model Markup Language (PMML) and Portable Format for Analytics (PFA). This means Yellowfin is now an end-to-end platform for data scientists. PASS 2017 Summit in data science news Microsoft sets foot on hybrid Azure SQL databases With new advances to its SQL Server 2017 solution and Azure data services, Microsoft intends to form the ultimate hybrid data platform. The company made new on-premises and cloud announcements at PASS Summit 2017. Among the new tools, Microsoft announced the Azure SQL Database Managed Instance and Azure Database Migration Service that enable users to ‘lift and shift’ on-premises SQL Server workloads. Both services are available in a private preview.  To help integrate some of these newly launched tools, Microsoft said it put features like integration with Python and R scripts into SQL Server 2017. Microsoft announces SQL Operations Studio The PASS 2017 summit saw another significant announcement from Microsoft for future: SQL Operations Studio. The studio is a free, lightweight tool for “modern database development and operations on Windows, Mac or Linux machines for SQL Server, Azure SQL Database, and Azure SQL Data Warehouse.” It includes smart T-SQL code snippets, customizable dashboards and support for popular command line tools. New Blockchain platforms in News Thunder token: Cornell professor announces new blockchain project that is faster and scalable A renowned computer science professor from Cornell University is set to launch a new blockchain project called “thunder token.” Known for her work on the fundamentals of distributed systems, Elaine Shi claimed that thunder token will be able to achieve speeds 1,000x greater than existing technologies, while also addressing the common blockchain problem of scalability. Making the announcement at ethereum's annual developer conference Devcon3, Shi said the new initiative is based on the thunderella protocol – a paper she co-authored with Cornell associate professor Rafael Pass. In thunder token, the protocol proposes a split set-up so that transactions are confirmed very quickly, with the blockchain only being used in the case of emergencies. The rest of the time, thunder token will use something a little less familiar – a system of agents that follows the direction of a "leader" to vote on which transactions are made according to the rules. It is not yet clear if the protocol will be purely private, or open to the public. FundRequest develops unique blockchain incentive platform for open source projects In a new approach towards open source that could benefit both developers as well as organizations, FundRequest has launched a new blockchain platform for the funding, claiming, and rewarding of open source contributions. With FundRequest, users will be given access to a decentralized ecosystem that provides code-enforced guarantees against corruption. Funding is only sent if a project’s functionality can be demonstrated, and will be withheld otherwise. FundRequest’s plugin will allow users to fund open requests on networks like Github with a fund button. After setting up a Github request ticket, users can use the FundRequest interface to fund them which generates a unique smart contract to manage the payment of funds. AI platforms in News Hikvision announces its AI Cloud platform At an artificial intelligence summit organized in Shenzhen, China, Hikvision unveiled its AI Cloud platform. The company said that Hikvision AI Cloud was developed to solve real world challenges existing in different vertical markets, and to create continuous value to end users. Hu Yangzhong, CEO of Hikvision who addressed the forum, noted how the ongoing trend of engineering AI algorithms into edge devices was making the edge more intelligent. "Edge computing uses local computing to enable analytics at the source of the data. With AI algorithms woven into the edge devices, only selected information such as an individual or a vehicle in a video image will be extracted and sent which significantly enhances the transition efficiency and reduces the network bandwidth, while still sustaining high quality and accuracy," Hu said. Dedrone unveils DroneTracker 3 for advanced drone detection through machine learning Drone detection technology developer Dedrone has announced the upgraded version of their software, named DroneTracker 3, which significantly expands the scope of current Dedrone features. DroneTracker 3 includes enhanced updates such as automated summary reporting, improved detection and reliability, enterprise-grade security and management, and an overall simplified set up which is easy to deploy. “Ultimately, DroneTracker 3 identifies how many drones are in an organization’s airspace, a question which was nearly impossible to answer prior to the launch of DroneTracker,” commmented Joerg Lamprecht, CEO and co-founder of Dedrone.
Read more
  • 0
  • 0
  • 1596

article-image-equity-ai-exchange-traded-fund
Abhishek Jha
02 Nov 2017
3 min read
Save for later

World's first AI exchange-traded funds could unleash a new era of automated trading systems

Abhishek Jha
02 Nov 2017
3 min read
The jury may be out on whether artificial intelligence has beaten humans in their own games, but what started as a smart business strategy is today almost everywhere. Even in stock markets. In Canada, robots have made their trading debut. In what could be the first global equity exchange-traded fund (ETF) run by machines, Horizons ETFs Management Inc.’s AI exchange-traded fund has already hit the market. In securities trading, an exchange-traded fund is a marketable security that trades on stock exchanges. Its portfolio is managed by some form of trust company, managed by human beings of course. Which is why you are bound to be apprehensive – will you invest in an ETF without a human being at the other end who convinces you with his decision backed by years of market research and real life experience? “I’m going to be buying some but I’m buying it as a nervous investor myself,” Steve Hawkins, co-chief executive officer of Horizons ETFs Management Canada, said before the Horizons Active AI Global Equity ETF began trading on the Toronto Stock Exchange, “We don’t know what the computer will do.” The AI-run ETF is listed on the exchange under the ticker MIND. And while it is still being sub-advised by Mirae Asset Global Investments, the AI system’s investment strategy is to analyze data from 50 investment metrics and obtain investment patterns yielding actionable insights. Experts feel MIND can be a big growth prospect in the ETF space. Just that unlike your portfolio manager guy next door, it will never be able to explain its decision. “We don’t know why it’s going to be making those independent decisions, but from our rigorous testing we believe that it’s going to make the right decisions,” Hawkins said. To elaborate on the word rigorous, the AI system developed by Korea’s Qraft Technologies was rigorously back-tested over 10 years, during which it learned how the market reacts when the data is interpreted in a certain way, and how to make smart investments in the process. True, artificial intelligence in stock markets may seem like venturing into uncharted waters, but Hawkins is upbeat the system will come out smarter than the average portfolio manager. “AI can do the work of a team of global strategists, can look at millions of data points very quickly, where a team of strategists would have to work 24/7, 365 days a year,” he says. “It doesn’t bring in investor bias or emotion with respect to any of its decisions, and we hope to see output that will be able to consistently outperform human decision-making.” Count these two as the most important factors to prefer AI over humans. Artificial intelligence does not have the weakness of human intelligence. It doesn’t take emotional decisions; it doesn’t claim to err is human. Brace yourself for an AI investment manager in future.
Read more
  • 0
  • 0
  • 1594
Modal Close icon
Modal Close icon