Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7010 Articles
article-image-quantum-computing-edge-analytics-and-meta-learning-key-trends-in-data-science-and-big-data-in-2019
Richard Gall
18 Dec 2018
11 min read
Save for later

Quantum computing, edge analytics, and meta learning: key trends in data science and big data in 2019

Richard Gall
18 Dec 2018
11 min read
When historians study contemporary notions of data in the early 21st century, 2018 might well be a landmark year. In many ways this was the year when Big and Important Issues - from the personal to the political - began to surface. The techlash, a term which has defined the year, arguably emerged from conversations and debates about the uses and abuses of data. But while cynicism casts a shadow on the brightly lit data science landcape, there’s still a lot of optimism out there. And more importantly, data isn’t going to drop off the agenda any time soon. However, the changing conversation in 2018 does mean that the way data scientists, analysts, and engineers use data and build solutions for it will change. A renewed emphasis on ethics and security is now appearing, which will likely shape 2019 trends. But what will these trends be? Let’s take a look at some of the most important areas to keep an eye on in the new year. Meta learning and automated machine learning One of the key themes of data science and artificial intelligence in 2019 will be doing more with less. There are a number of ways in which this will manifest itself. The first is meta learning. This is a concept that aims to improve the way that machine learning systems actually work by running machine learning on machine learning systems. Essentially this allows a machine learning algorithm to learn how to learn. By doing this, you can better decide which algorithm is most appropriate for a given problem. Find out how to put meta learning into practice. Learn with Hands On Meta Learning with Python. Automated machine learning is closely aligned with meta learning. One way of understanding it is to see it as putting the concept of automating the application of meta learning. So, if meta learning can help better determine which machine learning algorithms should be applied and how they should be designed, automated machine learning makes that process a little smoother. It builds the decision making into the machine learning solution. Fundamentally, it’s all about “algorithm selection, hyper-parameter tuning, iterative modelling, and model assessment,” as Matthew Mayo explains on KDNuggets. Automated machine learning tools What’s particularly exciting about automated machine learning is that there are already a number of tools that make it relatively easy to do. AutoML is a set of tools developed by Google that can be used on the Google Cloud Platform, while auto-sklearn, built around the scikit-learn library, provides a similar out of the box solution for automated machine learning. Although both AutoML and auto-sklearn are very new, there are newer tools available that could dominate the landscape: AutoKeras and AdaNet. AutoKeras is built on Keras (the Python neural network library), while AdaNet is built on TensorFlow. Both could be more affordable open source alternatives to AutoML. Whichever automated machine learning library gains the most popularity will remain to be seen, but one thing is certain: it makes deep learning accessible to many organizations who previously wouldn’t have had the resources or inclination to hire a team of PhD computer scientists. But it’s important to remember that automated machine learning certainly doesn’t mean automated data science. While tools like AutoML will help many organizations build deep learning models for basic tasks, for organizations that need a more developed data strategy, the role of the data scientist will remain vital. You can’t after all, automate away strategy and decision making. Learn automated machine learning with these titles: Hands-On Automated Machine Learning TensorFlow 1.x Deep Learning Cookbook         Quantum computing Quantum computing, even as a concept, feels almost fantastical. It's not just cutting-edge, it's mind-bending. But in real-world terms it also continues the theme of doing more with less. Explaining quantum computing can be tricky, but the fundamentals are this: instead of a binary system (the foundation of computing as we currently know it), which can be either 0 or 1, in a quantum system you have qubits, which can be 0, 1 or both simultaneously. (If you want to learn more, read this article). What Quantum computing means for developers So, what does this mean in practice? Essentially, because the qubits in a quantum system can be multiple things at the same time, you are then able to run much more complex computations. Think about the difference in scale: running a deep learning system on a binary system has clear limits. Yes, you can scale up in processing power, but you’re nevertheless constrained by the foundational fact of zeros and ones. In a quantum system where that restriction no longer exists, the scale of the computing power at your disposal increases astronomically. Once you understand the fundamental proposition, it becomes much easier to see why the likes of IBM and Google are clamouring to develop and deploy quantum technology. One of the most talked about use cases is using Quantum computers to find even larger prime numbers (a move which contains risks given prime numbers are the basis for much modern encryption). But there other applications, such as in chemistry, where complex subatomic interactions are too detailed to be modelled by a traditional computer. It’s important to note that Quantum computing is still very much in its infancy. While Google and IBM are leading the way, they are really only researching the area. It certainly hasn’t been deployed or applied in any significant or sustained way. But this isn’t to say that it should be ignored. It’s going to have a huge impact on the future, and more importantly it’s plain interesting. Even if you don’t think you’ll be getting to grips with quantum systems at work for some time (a decade at best), understanding the principles and how it works in practice will not only give you a solid foundation for major changes in the future, it will also help you better understand some of the existing challenges in scientific computing. And, of course, it will also make you a decent conversationalist at dinner parties. Who's driving Quantum computing forward? If you want to get started, Microsoft has put together the Quantum Development Kit, which includes the first quantum-specific programming language Q#. IBM, meanwhile, has developed its own Quantum experience, which allows engineers and researchers to run quantum computations in the IBM cloud. As you investigate these tools you’ll probably get the sense that no one’s quite sure what to do with these technologies. And that’s fine - if anything it makes it the perfect time to get involved and help further research and thinking on the topic. Get a head start in the Quantum Computing revolution. Pre-order Mastering Quantum Computing with IBM QX.           Edge analytics and digital twins While Quantum lingers on the horizon, the concept of the edge has quietly planted itself at the very center of the IoT revolution. IoT might still be the term that business leaders and, indeed, wider society are talking about, for technologists and engineers, none of its advantages would be possible without the edge. Edge computing or edge analytics is essentially about processing data at the edge of a network rather than within a centralized data warehouse. Again, as you can begin to see, the concept of the edge allows you to do more with less. More speed, less bandwidth (as devices no longer need to communicate with data centers), and, in theory, more data. In the context of IoT, where just about every object in existence could be a source of data, moving processing and analytics to the edge can only be a good thing. Will the edge replace the cloud? There's a lot of conversation about whether edge will replace cloud. It won't. But it probably will replace the cloud as the place where we run artificial intelligence. For example, instead of running powerful analytics models in a centralized space, you can run them at different points across the network. This will dramatically improve speed and performance, particularly for those applications that run on artificial intelligence. A more distributed world Think of it this way: just as software has become more distributed in the last few years, thanks to the emergence of the edge, data itself is going to be more distributed. We'll have billions of pockets of activity, whether from consumers or industrial machines, a locus of data-generation. Find out how to put the principles of edge analytics into practice: Azure IoT Development Cookbook Digital twins An emerging part of the edge computing and analytics trend is the concept of digital twins. This is, admittedly, still something in its infancy, but in 2019 it’s likely that you’ll be hearing a lot more about digital twins. A digital twin is a digital replica of a device that engineers and software architects can monitor, model and test. For example, if you have a digital twin of a machine, you could run tests on it to better understand its points of failure. You could also investigate ways you could make the machine more efficient. More importantly, a digital twin can be used to help engineers manage the relationship between centralized cloud and systems at the edge - the digital twin is essentially a layer of abstraction that allows you to better understand what’s happening at the edge without needing to go into the detail of the system. For those of us working in data science, digital twins provide better clarity and visibility on how disconnected aspects of a network interact. If we’re going to make 2019 the year we use data more intelligently - maybe even more humanely - then this is precisely the sort of thing we need. Interpretability, explainability, and ethics Doing more with less might be one of the ongoing themes in data science and big data in 2019, but we can’t ignore the fact that ethics and security will remain firmly on the agenda. Although it’s easy to dismiss these issues issues as separate from the technical aspects of data mining, processing, and analytics, but it is, in fact, deeply integrated into it. One of the key facets of ethics are two related concepts: explainability and interpretability. The two terms are often used interchangeably, but there are some subtle differences. Explainability is the extent to which the inner-working of an algorithm can be explained in human terms, while interpretability is the extent to which one can understand the way in which it is working (eg. predict the outcome in a given situation). So, an algorithm can be interpretable, but you might not quite be able to explain why something is happening. (Think about this in the context of scientific research: sometimes, scientists know that a thing is definitely happening, but they can’t provide a clear explanation for why it is.) Improving transparency and accountability Either way, interpretability and explainability are important because they can help to improve transparency in machine learning and deep learning algorithms. In a world where deep learning algorithms are being applied to problems in areas from medicine to justice - where the problem of accountability is particularly fraught - this transparency isn’t an option, it’s essential. In practice, this means engineers must tweak the algorithm development process to make it easier for those outside the process to understand why certain things are happening and why they aren't. To a certain extent, this ultimately requires the data science world to take the scientific method more seriously than it has done. Rather than just aiming for accuracy (which is itself often open to contestation), the aim is to constantly manage that gap between what we’re trying to achieve with an algorithm and how it goes about actually doing that. You can learn the basics of building explainable machine learning models in the Getting Started with Machine Learning in Python video.          Transparency and innovation must go hand in hand in 2019 So, there are two fundamental things for data science in 2019: improving efficiency, and improving transparency. Although the two concepts might look like the conflict with each other, it's actually a bit of a false dichotomy. If we realised that 12 months ago, we might have avoided many of the issues that have come to light this year. Transparency has to be a core consideration for anyone developing systems for analyzing and processing data. Without it, the work your doing might be flawed or unnecessary. You’re only going to need to add further iterations to rectify your mistakes or modify the impact of your biases. With this in mind, now is the time to learn the lessons of 2018’s techlash. We need to commit to stopping the miserable conveyor belt of scandal and failure. Now is the time to find new ways to build better artificial intelligence systems.
Read more
  • 0
  • 0
  • 41148

article-image-how-ira-hacked-american-democracy-using-social-media-and-meme-warfare-to-promote-disinformation-and-polarization-a-new-report-to-senate-intelligence-committee
Natasha Mathur
18 Dec 2018
9 min read
Save for later

How IRA hacked American democracy using social media and meme warfare to promote disinformation and polarization: A new report to Senate Intelligence Committee

Natasha Mathur
18 Dec 2018
9 min read
A new report prepared for the Senate Intelligence Committee by the cybersecurity firm, New Knowledge was released yesterday. The report titled “The Tactics & Tropes of the Internet Research Agency” provides an insight into how IRA a group of Russian agents used and continue to use social media to influence politics in America by exploiting the political and racial separation in American society.   “Throughout its multi-year effort, the Internet Research Agency exploited divisions in our society by leveraging vulnerabilities in our information ecosystem. We hope that our work has resulted in a clearer picture for policymakers, platforms, and the public alike and thank the Senate Select Committee on Intelligence for the opportunity to serve”, says the report. Russian interference during the 2016 Presidential Elections comprised of Russian agents trying to hack the online voting systems, making cyber-attacks aimed at Democratic National Committee and Russian tactics of social media influence to exacerbate the political and social divisions in the US. As a part of SSCI’s investigation into IRA’s social media activities, some of the social platforms companies such as Twitter, Facebook, and Alphabet that were misused by the IRA, provided data related to IRA influence tactics. However, none of these platforms provided complete sets of related data to SSCI. “Some of what was turned over was in PDF form; other datasets contained extensive duplicates. Each lacked core components that would have provided a fuller and more actionable picture. The data set provided to the SSCI for this analysis includes data previously unknown to the public.and..is the first comprehensive analysis by entities other than the social platforms”, reads the report.   The report brings to light IRA’s strategy that involved deciding on certain themes, primarily social issues and then reinforcing these themes across its Facebook, Instagram, and YouTube content. Different topics such as black culture, anti-Clinton, pro-trump, anti-refugee, Muslim culture, LGBT culture, Christian culture, feminism, veterans, ISIS, and so on were grouped thematically on Facebook Pages and Instagram accounts to reinforce the culture and to foster the feelings of pride.  Here is a look at some key highlights from the report. Key Takeaways IRA used Instagram as the biggest tool for influence As per the report, Facebook executives, during the Congressional testimony held in April this year, hid the fact that Instagram played a major role in IRA’s influence operation. There were about 187 million engagements on Instagram compared to 76.5 million on Facebook and 73 million on Twitter, according to a data set of posts between 2015 and 2018. In 2017, IRA moved much of its activity and influence operations to Instagram as media started looking into Facebook and Twitter operations. Instagram was the most effective platform for the Internet Research Agency and approximately 40% of Instagram accounts achieved over 10,000 followers (a level referred to as “micro-influencers” by marketers) and twelve of these accounts had over 100,000 followers (“influencer” level).                                     The Tactics & Tropes of IRA “Instagram engagement outperformed Facebook, which may indicate its strength as a tool in image-centric memetic (meme) warfare. Our assessment is that Instagram is likely to be a key battleground on an ongoing basis,” reads the report. Apart from social media posts, another feature of Instagram platform activity by IRA was merchandise. This merchandise promotion aimed at building partnerships for boosting audience growth and getting the audience data. This was especially evident in the black targeted communities with hashtags #supportblackbusiness and #buyblack appearing quite frequently. In fact, sometimes these IRA pages also offered coupons in exchange for sharing content.                                               The Tactics & Tropes of IRA IRA promoted Voter Suppression Operations The report states that although Twitter and Facebook were debating on determining if there was any voter suppression content present on these platforms, three major variants of voter suppression narratives was found widespread on Twitter, Facebook, Instagram, and YouTube.  These included malicious misdirection (eg: tweets promoting false voting rules), candidates supporting redirection, and turnout depression ( eg: no need to vote, your vote doesn’t matter). The Tactics & Tropes of IRA For instance, few days before the 2016 presidential elections in the US, IRA started to implement voter suppression tactics on the Black-community targeted accounts. IRA started to spread content about voter fraud and delivering warnings that “election would be stolen and violence might be necessary”. These suppression narratives and content was largely targeted almost exclusively at the Black community on Instagram and Facebook. There was also the promotion of other kinds of content on topics such as alienation and violence to divert people’s attention away from politics. Other varieties of voter suppression narratives include: “don’t vote, stay home”, “this country is not for Black people”, “these candidates don’t care about Black people”, etc. Voter-suppression narratives aimed at non-black communities focused primarily on promoting identity and pride for communities like Native Americans, LGBT+, and Muslims. The Tactics & Tropes of IRA Then there were narratives that directly and broadly called out for voting for candidates apart from Hillary Clinton and pages on Facebook that posted repeatedly about voter fraud, stolen elections, conspiracies about machines provided by Soros, and rigged votes. IRA largely targeted black American communities IRA’s major efforts over Facebook and Instagram were targeted at Black communities in America and involved developing and recruiting Black Americans as assets. The report states that IRA adopted a cross-platform media mirage strategy which shared authentic black related content to create a strong influence on the black community over social media.   An example presented in the report is that of a case study of “Black Matters” which illustrates the extent to which IRA created “inauthentic media property” by creating different accounts across the social platforms to “reinforce its brand” and widely distribute its content.  “Using only the data from the Facebook Page posts and memes, we generated a map of the cross-linked properties – other accounts that the Pages shared from, or linked to – to highlight the complex web of IRA-run accounts designed to surround Black audiences,” reads the report. So, an individual who followed or liked one of the Black-community-targeted IRA Pages would get exposed to content from a dozen other pages more. Apart from IRA’s media mirage strategy, there was also the human asset recruitment strategy. It involved posts encouraging Americans to perform different types of tasks for IRA handlers. Some of these tasks included requests for contact with preachers from Black churches, soliciting volunteers to hand out fliers, offering free self-defense classes (Black Fist/Fit Black), requests for speakers at protests, etc. These posts appeared in the Black, Left, and Right-targeted groups, although they were mostly present in the black groups and communities. “The IRA exploited the trust of their Page audiences to develop human assets, at least some of whom were not aware of the role they played. This tactic was substantially more pronounced on Black-targeted accounts”, reads the report. IRA also created domain names such as blackvswhite.info, blackmattersusa.com, blacktivist.info, blacktolive.org, and so on. It also created YouTube channels like “Cop Block US” and “Don’t Shoot” to spread anti-Clinton videos. In response to these reports of specific black targeting at Facebook, National Association for the Advancement of Colored People (NAACP) returned a donation from Facebook and called on its users yesterday to log out of all Facebook-owned products such as Facebook, Instagram, and Whatsapp today. “NAACP remains concerned about the data breaches and numerous privacy mishaps that the tech giant has encountered in recent years, and is especially critical about those which occurred during the last presidential election campaign”, reads the NAACP announcement. IRA promoted Pro-Trump and anti-Clinton operations As per the report, IRA focussed on promoting political content surrounding pro-Donald Trump sentiments over different channels and pages regardless of whether these pages targeted conservatives, liberals, or racial and ethnic groups. The Tactics & Tropes of IRA On the other hand, large volumes of political content articulated anti-Hillary Clinton sentiments among both the Right and Left-leaning communities created by IRA. Moreover, there weren’t any communities or pages on Instagram and Facebook that favored Clinton. There were some pro-Clinton Twitter posts, however, most of the tweets were still largely anti-Clinton. The Tactics & Tropes of IRA Additionally, there were different YouTube channels created by IRA such as Williams & Kalvin, Cop Block US, don’t shoot, etc, and 25 videos across these different channels consisted election-related keywords in their title and all of these videos were anti-Hillary Clinton. An example presented in a report is of one of the political channels, Paul Jefferson, solicited videos for a #PeeOnHillary video challenge for which the hashtag appeared on Twitter and Instagram.  and shared submissions that it received. Other videos promoted by these YouTube channels were “The truth about elections”, “HILLARY RECEIVED $20,000 DONATION FROM KKK TOWARDS HER CAMPAIGN”, and so on. Also, on IRA’s Facebook account, the post with maximum shares and engagement was a conspiracy theory about President Barack Obama refusing to ban Sharia Law, and encouraging Trump to take action. The Tactics & Tropes of IRA Also, the number one post on Facebook featuring Hillary Clinton was a conspiratorial post that was made public a month before the election. The Tactics & Tropes of IRA These were some of the major highlights from the report. However, the report states that there is still a lot to be done with regard to IRA specifically. There is a need for further investigation of subscription and engagement pathways and only these social media platforms currently have that data. New Knowledge team hopes that these platforms will provide more data that can speak to the impact among the targeted communities. For more information into the tactics of IRA, read the full report here. Facebook, Twitter takes down hundreds of fake accounts with ties to Russia and Iran, suspected to influence the US midterm elections Facebook plans to change its algorithm to demote “borderline content” that promotes misinformation and hate speech on the platform Facebook’s outgoing Head of communications and policy takes the blame for hiring PR firm ‘Definers’ and reveals more
Read more
  • 0
  • 0
  • 12051

article-image-troll-patrol-report-amnesty-international-and-element-ai-use-machine-learning-to-understand-online-abuse-against-women
Sugandha Lahoti
18 Dec 2018
5 min read
Save for later

Troll Patrol Report: Amnesty International and Element AI use machine learning to understand online abuse against women

Sugandha Lahoti
18 Dec 2018
5 min read
Amnesty International has partnered with Element AI to release a Troll Patrol report on the online abuse against women on Twitter. This finding was a part of their Troll patrol project which invites human rights researchers, technical experts, and online volunteers to build a crowd-sourced dataset of online abuse against women.   https://twitter.com/amnesty/status/1074946094633836544 Abuse of women on social media websites has been rising at an unprecedented rate. Social media websites have a responsibility to respect human rights and to ensure that women using the platform are able to express themselves freely and without fear. However, this has not been the case with Twitter and Amnesty has unearthed certain discoveries. Amnesty’s methodology was powered by machine learning Amnesty and Element AI surveyed 778 journalists and politicians from the UK and US throughout 2017 and then use machine learning techniques to qualitatively analyze abuse against women. The first process was to design large, unbiased dataset of tweets mentioning 778 women politicians and journalists from the UK and US. Next, over 6,500 volunteers (aged between 18 to 70 years old and from over 150 countries) analyzed 288,000 unique tweets to create a labeled dataset of abusive or problematic content. This was based on simple questions such as if the tweets were abusive or problematic, and if so, whether they revealed misogynistic, homophobic or racist abuse or other types of violence. Three experts also categorized a sample of 1,000 tweets to assess the quality of the tweets labeled by digital volunteers. Element AI used data science specifically using a subset of the Decoders and experts’ categorization of the tweets, to extrapolate the abuse analysis. Key findings from the report Per the findings of the Troll Patrol report, 7.1% of tweets sent to the women in the study were “problematic” or “abusive”. This amounts to 1.1 million tweets mentioning 778 women across the year, or one every 30 seconds. Women of color, (black, Asian, Latinx and mixed-race women) were 34% more likely to be mentioned in abusive or problematic tweets than white women. Black women were disproportionately targeted, being 84% more likely than white women to be mentioned in abusive or problematic tweets. Source: Amnesty Online abuse targets women from across the political spectrum faced similar levels of online abuse and both liberals and conservatives alike, as well as left and right-leaning media organizations, were targeted. Source: Amnesty   What does this mean for people in tech Social media organizations are repeatedly failing in their responsibility to protect women’s rights online. They fall short of adequately investigating and responding to reports of violence and abuse in a transparent manner which leads many women to silence or censor themselves on the platform. Such abuses also hinder the freedom of expression online and also undermines women’s mobilization for equality and justice, particularly those groups who already face discrimination and marginalization. What can tech platforms do? One of the recommendations of the report is that social media platforms should publicly share comprehensive and meaningful information about reports of violence and abuse against women, as well as other groups, on their platforms. They should also talk in detail about how they are responding to it. Although Twitter and other platforms are using machine learning for content moderation and flagging, they should be transparent about the algorithms they use. They should publish information about training data, methodologies, moderation policies and technical trade-offs (such as between greater precision or recall) for public scrutiny. Machine learning automation should ideally be part of a larger content moderation system characterized by human judgment, greater transparency, rights of appeal and other safeguards. Amnesty in collaboration with Element AI also developed a machine learning model to better understand the potential and risks of using machine learning in content moderation systems. This model was able to achieve results comparable to their digital volunteers at predicting abuse, although it is ‘far from perfect still’, Amnesty notes. It achieves about a 50% accuracy level when compared to the judgment of experts. It was able to correctly identify 2 in every 14 tweets as abusive or problematic in comparison to experts who identified 1 in every 14 tweets as abusive or problematic. “Troll Patrol isn’t about policing Twitter or forcing it to remove content. We are asking it to be more transparent, and we hope that the findings from Troll Patrol will compel it to make that change. Crucially, Twitter must start being transparent about how exactly they are using machine learning to detect abuse, and publish technical information about the algorithms they rely on”. said Milena Marin senior advisor for tactical research at Amnesty International. Read more: The full list of Amnesty’s recommendations to Twitter. People on Twitter (the irony) are shocked at the release of Amnesty’s report and #ToxicTwitter is trending. https://twitter.com/gregorystorer/status/1074959864458178561 https://twitter.com/blimundaseyes/status/1074954027287396354 https://twitter.com/MikeWLink/status/1074500992266354688 https://twitter.com/BethRigby/status/1074949593438265344 Check out the full Troll Patrol report on Amnesty. Also, check out their machine learning based methodology in detail. Amnesty International takes on Google over Chinese censored search engine, Project Dragonfly. Twitter CEO, Jack Dorsey slammed by users after a photo of him holding ‘smash Brahminical patriarchy’ poster went viral Twitter plans to disable the ‘like’ button to promote healthy conversations; should retweet be removed instead?
Read more
  • 0
  • 0
  • 10551

article-image-python-governance-vote-results-are-here-the-steering-council-model-is-the-winner
Prasad Ramesh
18 Dec 2018
3 min read
Save for later

Python governance vote results are here: The steering council model is the winner

Prasad Ramesh
18 Dec 2018
3 min read
The election to select the governance model for Python following the stepping down of Guido van Rossum as the BDFL earlier this year has ended and PEP 8016 was selected as the winner. PEP 8016 is the steering council model that has a focus on providing a minimal and solid foundation for governance decisions. The vote has chosen a governance PEP that will be implemented on the Python project. The winner: PEP 8016 the steering council model Authored by Nathaniel J. Smith, and Donald Stufft, this proposal involves a model for Python governance based on a steering council. The council has vast authority, which they intend to use as rarely as possible, instead, they plan to use this power to establish standard processes. The steering council committee consists of five people. A general philosophy is followed—it's better to split up large changes into a series of small changes to be reviewed independently. As opposed to trying to do everything in one PEP, the focus is on providing a minimal and solid foundation for future governance decisions. This PEP was accepted on December 17, 2018. Goals of the steering council model The main goals of this proposal are: Sticking to the basics aka ‘be boring’. The authors don't think Python is a good place to experiment with new and untested governance models. Hence, this proposal sticks to mature, well-known, processes that have been tested previously. A high-level approach where the council does not involve much very common in large successful open source projects. The low-level details are directly derived from Django's governance. Being simple enough for minimum viable governance. The proposal attempts to slim things down to the minimum required, just enough to make it workable. The trimming includes the council, the core team, and the process for changing documentation. A goal is to ‘be comprehensive’. The things that need to be defined are covered well for future use. Having a clear set of rules will also help minimize confusion. To ‘be flexible and light-weight’. The authors are aware that to find the best processes for working together, it will take time and experimentation. Hence, they keep the document as minimal as possible, for maximal flexibility to adjust things later. The need for heavy-weight processes like whole-project votes is also minimized. The council will work towards maintaining the quality of and stability of the Python language and the CPython interpreter. Make contribution process easy, maintain relations with the core team, establish a decision-making process for PEPs, and so on. They have powers to make decisions on PEPs, enforce project code of conduct, etc. To know more about the election to the committee visit the Python website. NumPy drops Python 2 support. Now you need Python 3.5 or later. NYU and AWS introduce Deep Graph Library (DGL), a python package to build neural network graphs Python 3.7.2rc1 and 3.6.8rc1 released
Read more
  • 0
  • 0
  • 20456

article-image-mips-open-sourced-under-mips-open-program-makes-the-semiconductor-space-and-soc-ones-to-watch-for-in-2019
Melisha Dsouza
18 Dec 2018
4 min read
Save for later

MIPS open sourced under ‘MIPS Open Program’, makes the semiconductor space and SoC, ones to watch for in 2019

Melisha Dsouza
18 Dec 2018
4 min read
On 17th December, Wave Computing announced that it will put MIPS on open source, with MIPS Instruction Set Architecture (ISA) and MIPS’ latest core R6 to be made available in the first quarter of 2019. With a vision to “accelerate the ability for semiconductor companies, developers and universities to adopt and innovate using MIPS for next-generation system-on-chip (SoC) designs”, Wave computings’ MIPS Open program will give participants full access to the most recent versions of the 32-bit and 64-bit MIPS ISA free of charge, without any licensing or royalty fees. Under this program, participants will have full access to the most recent versions of the 32-bit and 64-bit MIPS ISA free of charge – with no licensing or royalty fees. Additionally, participants in the MIPS Open program will be licensed under MIPS’ existing worldwide patents. Addressing the “lack of open source access to true industry-standard, patent-protected and silicon-proven RISC architectures”, Art Swift, president of Wave Computing’s MIPS IP Business claims that MIPS will bring to the open-source community “commercial-ready” instruction sets with “industrial-strength” architecture, where “Chip designers will have opportunities to design their own cores based on proven and well-tested instruction sets for any purposes.” Lee Flanagin, Wave’s senior vice president and chief business officer further added in the post that the MIPS Open initiative is a key part of Wave’s ‘AI for All’ vision. He says that “The MIPS-based solutions developed under MIPS Open will complement our existing and future MIPS IP cores that Wave will continue to create and license globally as part of our overall portfolio of systems, solutions and IP. This will ensure current and new MIPS customers will have a broad array of solutions from which to choose for their SoC designs, and will also have access to a vibrant MIPS development community and ecosystem.” The MIPS Open initiative further will encourage the adoption of MIPS while helping customers develop new, MIPS-compatible solutions for a variety of emerging market applications from third-party tool vendors, software developers and universities. RISC-V versus MIPS? Considering that the RISC-V instruction set architecture is also free and open for anyone to use,  the internet went abuzz with speculations about competition between RISC-V and MIPS and the potential future of both. Hacker news also saw comments like: “Had this happened two or three years ago, RISC-V would have never been born.” In an interview to EE Times, Rupert Baines, CEO of UltraSoC, said that “Given RISC-V’s momentum, MIPS going open source is an interesting, shrewd move.”  He observed, “MIPS already has a host of quality tools and software environment. This is a smart way to amplify MIPS’ own advantage, without losing much.” Linley Gwennap, principal analyst at the Linley Group compared the two chips and stated that, “The MIPS ISA is more complete than RISC-V. For example, it includes DSP and SIMD extensions, which are still in committee for RISC-V.”. Calling the MIPS software development tools more mature than RISC-V, he went on to list down the benefits of MIPS over RISC: “MIPS also provides patent protection and a central authority to avoid ISA fragmentation, both of which RISC-V lacks. These factors give MIPS an advantage for commercial implementations, particularly for customer-facing cores.” Hacker News and Twitter are bustling with comments on this move by Wave computing. Opinions are split over which architecture is more preferable to use. For the most part, customers appear excited about this news. https://twitter.com/corkmork/status/1074857920293027840 https://twitter.com/plessl/status/1074778310025076736 You can head over to Wave Computing’s official blog to know more about this announcement. The Linux and RISC-V foundations team up to drive open source development and adoption of RISC-V instruction set architecture (ISA) Arm releases free Cortex-M processor cores for FPGAs, includes measures to combat FOSSi threat SpectreRSB targets CPU return stack buffer, found on Intel, AMD, and ARM chipsets  
Read more
  • 0
  • 0
  • 20089

article-image-neurips-2018-how-machine-learning-experts-can-work-with-policymakers-to-make-good-tech-decisions-invited-talk
Bhagyashree R
18 Dec 2018
6 min read
Save for later

NeurIPS 2018: How machine learning experts can work with policymakers to make good tech decisions [Invited Talk]

Bhagyashree R
18 Dec 2018
6 min read
At the 32nd annual  NeurIPS conference held earlier this month, Edward William Felten, a professor of computer science and public affairs at Princeton University spoke about how decision makers and tech experts can work together to make better policies. The talk was aimed at answering questions such as why should public policy matter to AI researchers, what role can researchers play in policy debates, and how can researchers help bridge divides between the research and policy communities. While AI and machine learning are being used in high impact areas and have seen heavy adoption in every field, in recent years, they have also gained a lot of attention from the policymakers. Technology has become a huge topic of discussion among policymakers mainly because of its cases of failure and how it is being used or misused. They have now started formulating laws and regulations and holding discussions about how society will govern the development of these technologies. Prof. Felten explained how having constructive engagement with policymakers will lead to better outcomes for technology, government, and society. Why tech should be regulated? Regulating tech is important, and for that researchers, data scientists, and other people in tech fields have to close the gap between their research labs, cubicles, and society. Prof. Felten emphasizes that it is up to the tech people to bridge this gap as we not only have the opportunity but also a duty to be more active and productive in participating in public life. There are many people coming to the conclusion that tech should be regulated before it is too late. In a piece published by the Wall Street Journal, three experts debated about whether the government should regulate AI. One of them, Ryan Calo explains, “One of the ironies of artificial intelligence is that proponents often make two contradictory claims. They say AI is going to change everything, but there should be no changes to the law or legal institutions in response.” Prof. Felten points out that law and policies are meant to change in order to adapt according to the current conditions. They are not just written once and for all for the cases of today and the future, rather law is a living system that adapts to what is going on in the society. And, if we believe that technology is going to change everything, we can expect that law will change. Prof. Felten also said that not only the tech researchers and policymakers but the society also should also have some say in how the technology is developed, “After all the people who are affected by the change that we are going to cause deserve some say in how that change happens, how it is used. If we believe in a society which is fundamentally democratic in which everyone has a stake and everyone has a voice then it is only fair that those lives we are going to change have some say in how that change come about and what kind of changes are going to happen and which are not.” How experts can work with decision makers to make good tech decisions The three key approaches that we can take to engage with policymakers to take a decision about technology: Engage in a two-way dialogue with policymakers As a researcher, we might think that we are tech experts/scientists and we do not need to get involved in politics. We need to just share the facts we know and our job is done. But if researchers really want to maximize their impact in policy debates, they need to combine the knowledge and preferences of policymakers with their knowledge and preferences. Which means, they need to take into account what policymakers might already have heard about a particular subject and the issues or approaches that resonate with them. Prof. Felten explains that this type of understanding and exchange of ideas can be done in two stages. Researchers need to ask several questions to policymakers, which is not a one-time thing, rather a multi-round protocol. They have to go back and forth with the person and need to build engagement over time and mutual trust. And, then they need to put themselves into the shoes of a decision maker and understand how to structure the decision space for them. Be present in the room when the decisions are being made To have their influence on the decisions that get made, researchers need to have “boots on the ground.” Though not everyone has to engage in this deep and long-term process of decision making, we need some people from the community to engage on behalf of the community. Researchers need to be present in the room when the decisions are being made. This means taking posts as advisers or civil servants. We already have a range of such posts at both local and national government levels, alongside a range of opportunities to engage less formally in policy development and consultations. Creating a career path and rewarding policy engagement To drive this engagement, we need to create a career path which rewards policy engagement. We should have a way through which researchers can move between policy and research careers. Prof. Felten pointed to a range of US-based initiatives that seek to bring those with technical expertise into policy-oriented roles, such as the US Digital Service. He adds that if we do not create these career paths and if this becomes something that people can do only after sacrificing their careers then very few people will do it. This needs to be an activity that we learn to respect when people in the community do it well. We need to build incentives whether it is in career incentives in academia, whether it is understanding that working in government or on policy issues is a valuable part of one kind of academic career and not thinking of it as deter or a stop. To watch the full talk, check out NeurIPS Facebook page. NeurIPS 2018: Rethinking transparency and accountability in machine learning NeurIPS 2018: Developments in machine learning through the lens of Counterfactual Inference [Tutorial] Accountability and algorithmic bias: Why diversity and inclusion matters [NeurIPS Invited Talk]
Read more
  • 0
  • 0
  • 15736
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-key-trends-in-software-infrastructure-in-2019
Richard Gall
17 Dec 2018
10 min read
Save for later

Key trends in software infrastructure in 2019: observability, chaos, and cloud complexity

Richard Gall
17 Dec 2018
10 min read
Software infrastructure has, over the last decade or so, become a key concern for developers of all stripes. Long gone are narrowly defined job roles; thanks to DevOps, accountability for how code is now shared between teams on both development and deployment sides. For anyone that’s ever been involved in the messy frustration of internal code wars, this has been a welcome change. But as developers who have traditionally sat higher up the software stack dive deeper into the mechanics of deploying and maintaining software, for those of us working in system administration, DevOps, SRE, and security (the list is endless, apologies if I’ve forgotten you), the rise of distributed systems only brings further challenges. Increased complexity not only opens up new points of failure and potential vulnerability, at a really basic level it makes understanding what’s actually going on difficult. And, essentially, this is what it will mean to work in software delivery and maintenance in 2019. Understanding what’s happening, minimizing downtime, taking steps to mitigate security threats - it’s a cliche, but finding strategies to become more responsive rather than reactive will be vital. Indeed, many responses to these kind of questions have emerged this year. Chaos engineering and observability, for example, have both been gaining traction within the SRE world, and are slowly beginning to make an impact beyond that particular job role. But let’s take a deeper look at what is really going to matter in the world of software infrastructure and architecture in 2019. Observability and the rise of the service mesh Before we decide what to actually do, it’s essential to know what’s actually going on. That seems obvious, but with increasing architectural complexity, that’s getting harder. Observability is a term that’s being widely thrown around as a response to this - but it has been met with some cynicism. For some developers, observability is just a sexed up way of talking about good old fashioned monitoring. But although the two concepts have a lot in common, observability is more of an approach, a design pattern maybe, rather than a specific activity. This post from The New Stack explains the difference between monitoring and observability incredibly well. Observability is “a measure of how well internal states of a system can be inferred from knowledge of its external outputs.” which means observability is more a property of a system, rather than an activity. There are a range of tools available to help you move towards better observability. Application management and logging tools like Splunk, Datadog, New Relic and Honeycomb can all be put to good use and are a good first step towards developing a more observable system. Want to learn how to put monitoring tools to work? Check out some of these titles: AWS Application Architecture and Management [Video]     Hands on Microservices Monitoring and Testing       Software Architecture with Spring 5.0      As well as those tools, if you’re working with containers, Kubernetes has some really useful features that can help you more effectively monitor your container deployments. In May, Google announced StackDriver Kubernetes Monitoring, which has seen much popularity across the community. Master monitoring with Kubernetes. Explore these titles: Google Cloud Platform Administration     Mastering Kubernetes      Kubernetes in 7 Days [Video]        But there’s something else emerging alongside observability which only appears to confirm it’s importance: that thing is the notion of a service mesh. The service mesh is essentially a tool that allows you to monitor all the various facets of your software infrastructure helping you to manage everything from performance to security to reliability. There are a number of different options out there when it comes to service meshes - Istio, Linkerd, Conduit and Tetrate being the 4 definitive tools out there at the moment. Learn more about service meshes inside these titles: Microservices Development Cookbook     The Ultimate Openshift Bootcamp [Video]     Cloud Native Application Development with Java EE [Video]       Why is observability important? Observability is important because it sets the foundations for many aspects of software management and design in various domains. Whether you’re an SRE or security engineer, having visibility on the way in which your software is working will be essential in 2019. Chaos engineering Observability lays the groundwork for many interesting new developments, chaos engineering being one of them. Based on the principle that modern, distributed software is inherently unreliable, chaos engineering ‘stress tests’ software systems. The word ‘chaos’ is a bit of a misnomer. All testing and experimentation on your software should follow a rigorous and almost scientific structure. Using something called chaos experiments - adding something unexpected into your system, or pulling a piece of it out like a game of Jenga - chaos engineering helps you to better understand the way it will act in various situations. In turn, this allows you to make the necessary changes that can help ensure resiliency. Chaos engineering is particularly important today simply because so many people, indeed, so many things, depend on software to actually work. From an eCommerce site to a self driving car, if something isn’t working properly there could be terrible consequences. It’s not hard to see how chaos engineering fits alongside something like observability. To a certain extent, it’s really another way of achieving observability. By running chaos experiments, you can draw out issues that may not be visible in usual scenarios. However, the caveat is that chaos engineering isn’t an easy thing to do. It requires a lot of confidence and engineering intelligence. Running experiments shouldn’t be done carelessly - in many ways, the word ‘chaos’ is a bit of a misnomer. All testing and experimentation on your software should follow a rigorous and almost scientific structure. While chaos engineering isn’t straightforward, there are tools and platforms available to make it more manageable. Gremlin is perhaps the best example, offering what they describe as ‘resiliency-as-a-service’. But if you’re not ready to go in for a fully fledged platform, it’s worth looking at open source tools like Chaos Monkey and ChaosToolkit. Want to learn how to put the principles of chaos engineering into practice? Check out this title: Microservice Patterns and Best Practices       Learn the principles behind resiliency with these SRE titles: Real-World SRE       Practical Site Reliability Engineering       Better integrated security and code testing Both chaos engineering and observability point towards more testing. And this shouldn’t be surprising: testing is to be expected in a world where people are accountable for unpredictable systems. But what’s particularly important is how testing is integrated. Whether it’s for security or simply performance, we’re gradually moving towards a world where testing is part of the build and deploy process, not completely isolated from it. There are a diverse range of tools that all hint at this move. Archery, for example, is a tool designed for both developers and security testers to better identify and assess security vulnerabilities at various stages of the development lifecycle. With a useful dashboard, it neatly ties into the wider trend of observability. ArchUnit (sounds similar but completely unrelated) is a Java testing library that allows you to test a variety of different architectural components. Similarly on the testing front, headless browsers continue to dominate. We’ve seen some of the major browsers bringing out headless browsers, which will no doubt delight many developers. Headless browsers allow developers to run front end tests on their code as if it were live and running in the browser. If this sounds a lot like PhantomJS, that’s because it is actually quite a bit like PhantomJS. However, headless browsers do make the testing process much faster. Smarter software purchasing and the move to hybrid cloud The key trends we’ve seen in software architecture are about better understanding your software. But this level of insight and understanding doesn’t matter if there’s no alignment between key decision makers and purchasers. Whatever cloud architecture you have, strong leadership and stakeholder management are essential. This can manifest itself in various ways. Essentially, it’s a symptom of decision makers being disconnected from engineers buried deep in their software. This is by no means a new problem, cloud coming to define just about every aspect of software, it’s now much easier for confusion to take hold. The best thing about cloud is also the worst thing - the huge scope of opportunities it opens up. It makes decision making a minefield - which provider should we use? What parts of it do we need? What’s going to be most cost effective? Of course, with hybrid cloud, there's a clear way of meeting those issues. But it's by no means a silver bullet. Whatever cloud architecture you have, strong leadership and stakeholder management are essential. This is something that ThoughtWorks references in its most recent edition of Radar (November 2018). Identifying two trends they call ‘bounded buy’ and ‘risk commensurate vendor strategy’ ThoughtWorks highlights how organizations can find their SaaS of choice shaping their strategy in its own image (bounded buy) or look to outsource business critical applications, functions or services. T ThoughtWorks explains: “This trade-off has become apparent as the major cloud providers have expanded their range of service offerings. For example, using AWS Secret Management Service can speed up initial development and has the benefit of ecosystem integration, but it will also add more inertia if you ever need to migrate to a different cloud provider than it would if you had implemented, for example, Vault”. Relatedly, ThoughtWorks also identifies a problem with how organizations manage cost. In the report they discuss what they call ‘run cost as architecture fitness function’ which is really an elaborate way of saying - make sure you look at how much things cost. So, for example, don’t use serverless blindly. While it might look like a cheap option for smaller projects, your costs could quickly spiral and leave you spending more than you would if you ran it on a typical cloud server. Get to grips with hybrid cloud: Hybrid Cloud for Architects       Building Hybrid Clouds with Azure Stack     Become an effective software and solutions architect in 2019: AWS Certified Solutions Architect - Associate Guide     Architecting Cloud Computing Solutions     Hands-On Cloud Solutions with Azure       Software complexity needs are best communicated in a simple language: money In practice, this takes us all the way back to the beginning - it’s simply the financial underbelly of observability. Performance, visibility, resilience - these matter because they directly impact the bottom line. That might sound obvious, but if you’re trying to make the case, say, for implementing chaos engineering, or using a any other particular facet of a SaaS offering, communicating to other stakeholders in financial terms can give you buy-in and help to guarantee alignment. If 2019 should be about anything, it’s getting closer to this fantasy of alignment. In the end, it will keep everyone happy - engineers and businesses
Read more
  • 0
  • 0
  • 35887

article-image-the-future-of-cloud-lies-in-revisiting-the-designs-and-limitations-of-todays-notion-of-serverless-computing-say-uc-berkeley-researchers
Savia Lobo
17 Dec 2018
5 min read
Save for later

The Future of Cloud lies in revisiting the designs and limitations of today’s notion of ‘serverless computing’, say UC Berkeley researchers

Savia Lobo
17 Dec 2018
5 min read
Last week, researchers at the UC Berkeley released a research paper titled ‘Serverless Computing: One Step Forward, Two Steps Back’, which highlights some pitfalls in the current serverless architectures. Researchers have also explored the challenges that should be addressed to utilize the complete potential that the cloud can offer to innovative developers. Cloud isn’t being used to the fullest The researchers have termed cloud as “the biggest assemblage of data capacity and distributed computing power ever available to the general public, managed as a service”. The cloud today is being used as an outsourcing platform for standard enterprise data services. In order to leverage the actual potential of the cloud to the fullest, creative developers need programming frameworks. The majority of cloud services are simply multi-tenant, easier-to-administer clones of legacy enterprise data services such as object storage, databases, queueing systems, and web/app servers. Off late, the buzz for serverless computing--a platform in the cloud where developers simply upload their code, and the platform executes it on their behalf as needed at any scale--is on the rise. This is because public cloud vendors have started offering new programming interfaces under the banner of serverless computing. The researchers support this with a Google search trend comparison where the term “serverless” recently matched the historic peak of popularity of the phrase “Map Reduce” or “MapReduce”. Source: arxiv.org They point out that the notion of serverless computing is vague enough to allow optimists to project any number of possible broad interpretations on what it might mean. Hence, in this paper, they have assessed the field based on the serverless computing services that vendors are actually offering today and also see why these services are a disappointment given that the cloud has a bigger potential. A Serverless architecture based on FaaS (Function-as-a-Service) Functions-as-a-Service (FaaS) is the commonly used and more descriptive name for the core of serverless offerings from the public cloud providers. Typical FaaS offerings today support a variety of languages (e.g., Python, Java, Javascript, Go), allow programmers to register functions with the cloud provider, and enable users to declare events that trigger each function. The FaaS infrastructure monitors the triggering events, allocates a runtime for the function, executes it, and persists the results. The user is billed only for the computing resources used during function invocation. Building applications on FaaS not only requires data management in both persistent and temporary storage but also mechanisms to trigger and scale function execution. According to the researchers, cloud providers are quick to emphasize that serverless is not only FaaS, but it is, FaaS supported by a “standard library”: the various multi-tenanted, autoscaling services provided by the vendor; for instance, S3 (large object storage), DynamoDB (key-value storage), SQS (queuing services), and more. However, current FaaS solutions are good for simple workloads of independent tasks such as parallel tasks embedded in Lambda functions, or jobs to be run by the proprietary cloud services. However, when it comes to use cases that involve stateful tasks, these FaaS have a surprisingly high latency. These realities limit the attractive use cases for FaaS today, discouraging new third-party programs that go beyond the proprietary service offerings from the vendors. Limitations of the current FaaS offering No recoverability Function invocations are shut down by the Lambda infrastructure automatically after 15 minutes. Lambda may keep the function’s state cached in the hosting VM in order to support a ‘warm start’ state. However, there is no way to ensure that subsequent invocations are run on the same VM. Hence functions must be written assuming that state will not be recoverable across invocations. I/O Bottlenecks Lambdas usually connect to cloud services or shared storage across a network interface. This means moving data across nodes or racks. With FaaS, things appear even worse than the network topology would suggest. Recent studies show that a single Lambda function can achieve on average 538 Mbps network bandwidth. This is an order of magnitude slower than a single modern SSD. Worse, AWS appears to attempt to pack Lambda functions from the same user together on a single VM, so the limited bandwidth is shared by multiple functions. The result is that as compute power scales up, per-function bandwidth shrinks proportionately. With 20 Lambda functions, average network bandwidth was 28.7Mbps—2.5 orders of magnitude slower than a single SSD. Communication Through Slow Storage Lambda functions can only communicate through an autoscaling intermediary service. As a corollary, a client of Lambda cannot address the particular function instance that handled the client’s previous request: there is no “stickiness” for client connections. Hence maintaining state across client calls require writing the state out to slow storage, and reading it back on every subsequent call. No Specialized Hardware FaaS offerings today only allow users to provision a time slice of a CPU hyperthread and some amount of RAM; in the case of AWS Lambda, one determines the other. There is no API or mechanism to access specialized hardware. These constraints, combined with some significant shortcomings in the standard library of FaaS offerings, substantially limit the scope of feasible serverless applications. The researchers conclude, “We see the future of cloud programming as far, far brighter than the promise of today’s serverless FaaS offerings. Getting to that future requires revisiting the designs and limitations of what is being called ‘serverless computing’ today.” They believe cloud programmers need to build a programmable framework that goes beyond FaaS, to dynamically manage the allocation of resources in order to meet user-specified performance goals for both compute and for data. The program analysis and scheduling issues are likely to open up significant opportunities for more formal research, especially for data-centric programs. To know more this research in detail, read the complete research paper. Introducing GitLab Serverless to deploy cloud-agnostic serverless functions and applications Introducing ‘Pivotal Function Service’ (alpha): an open, Kubernetes based, multi-cloud serverless framework for developer workloads Introducing numpywren, a system for linear algebra built on a serverless architecture
Read more
  • 0
  • 0
  • 18665

article-image-nvidia-demos-a-style-based-generative-adversarial-network-that-can-generate-extremely-realistic-images-has-ml-community-enthralled
Prasad Ramesh
17 Dec 2018
4 min read
Save for later

NVIDIA demos a style-based generative adversarial network that can generate extremely realistic images; has ML community enthralled

Prasad Ramesh
17 Dec 2018
4 min read
In a paper published last week, NVIDIA researchers come up with a way to generate photos that look like they were clicked with a camera. This is done via using generative adversarial networks (GANs). An alternative architecture for GANs Borrowing from style transfer literature, the researchers use an alternative generator architecture for GANs. The new architecture induces an automatically learned unsupervised separation of high-level attributes of an image. These attributes can be pose or identity of a person. Images generated via the architecture have some stochastic variation applied to them like freckles, hair placement etc. The architecture allows intuitive and scale-specific control of the synthesis to generate different variations of images. Better image quality than a traditional GAN This new generator is better than the state-of-the-art with respect to image quality, the images have better interpolation properties and disentangles the latent variation factors better. In order to quantify the interpolation quality and disentanglement, the researchers propose two new automated methods which are applicable to any generator architecture. They use a new high quality, highly varied data set with human faces. With motivation from transfer literature, NVIDIA researchers re-design the generator architecture to expose novel ways of controlling image synthesis. The generator starts from a learned constant input and adjusts the style of an image at each convolution layer. It makes the changes based on the latent code thereby having direct control over the strength of image features across different scales. When noise is injected directly into the network, this architectural change causes automatic separation of high-level attributes in an unsupervised manner. Source: A Style-Based Generator Architecture for Generative Adversarial Networks In other words, the architecture combines different images, their attributes from the dataset, applies some variations to synthesize images that look real. As proven in the paper, surprisingly, the redesign of images does not compromise image quality but instead improves it considerably. In conclusion with other works, a traditional GAN generator architecture is inferior to a style-based design. Not only human faces but they also generate bedrooms, cars, and cats with this new architecture. Public reactions This synthetic image generation has generated excitement among the public. A comment from Hacker News reads: “This is just phenomenal. Can see this being a fairly disruptive force in the media industry. Also, sock puppet factories could use this to create endless numbers of fake personas for social media astroturfing.” Another comment reads: “The improvements in GANs from 2014 are amazing. From coarse 32x32 pixel images, we have gotten to 1024x1024 images that can fool most humans.” Fake photographic images as evidence? As a thread on Twitter suggests, can this be the end of photography as evidence? Not very likely, at least for the time being. For something to be considered as evidence, there are many poses, for example, a specific person doing a specific action. As seen from the results in tha paper, some cat images are ugly and deformed, far from looking like the real thing. Also “Our training time is approximately one week on an NVIDIA DGX-1 with 8 Tesla V100 GPUs” now that a setup that costs up to $70K. Besides, some speculate that there will be bills in 2019 to control the use of such AI systems: https://twitter.com/BobbyChesney/status/1074046157431717894 Even the big names in AI are noticing this paper: https://twitter.com/goodfellow_ian/status/1073294920046145537 You can see a video showcasing the generated images on YouTube. This AI generated animation can dress like humans using deep reinforcement learning DeepMasterPrints: ‘master key’ fingerprints made by a neural network can now fake fingerprints UK researchers have developed a new PyTorch framework for preserving privacy in deep learning
Read more
  • 0
  • 0
  • 17949

article-image-neurips-2018-rethinking-transparency-and-accountability-in-machine-learning
Bhagyashree R
16 Dec 2018
8 min read
Save for later

NeurIPS 2018: Rethinking transparency and accountability in machine learning

Bhagyashree R
16 Dec 2018
8 min read
Key takeaways from the discussion To solve problems with machine learning, you must first understand them. Different people or groups of people are going to define a problem in a different way. So, we shouldn't believe that the way we want to frame the problem computationally is the right way. If we allow that our systems include people and society, it is clear that we have to help negotiate values, not simply define them. Last week, at the 32nd NeurIPS 2018 annual conference, Nitin Koli, Joshua Kroll, and Deirdre Mulligan presented the common pitfalls we see when studying the human side of machine learning. Machine learning is being used in high-impact areas like medicine, criminal justice, employment, and education for making decisions. In recent years, we have seen that this use of machine learning and algorithmic decision making have resulted in unintended discrimination.  It’s becoming clear that even models developed with the best of intentions may exhibit discriminatory biases and perpetuate inequality. Although researchers have been analyzing how to put concepts like fairness, accountability, transparency, explanation, and interpretability into practice in machine learning, properly defining these things can prove a challenge. Attempts have been made to define them mathematically, but this can bring new problems. This is because applying mathematical logic to human concepts that have unique and contested political and social dimensions necessarily has blind spots - every point of contestation can’t be integrated into a single formula. In turn, this can cause friction with other disciplines as well as the public. Based on their research on what various terms mean in different contexts, Nitin Koli, Joshua Krill, and Deirdre Mulligan drew out some of the most common misconceptions machine learning researchers and practitioners hold. Sociotechnical problems To find a solution to a particular problem, data scientists need precise definitions. But how can we verify that these definitions are correct? Indeed, many definitions will be contested, depending on who you are and what you want them to mean. A definition that is fair to you will not necessarily be fair to me”, remarks Mr. Kroll. Mr. Kroll explained that while definitions can be unhelpful, they are nevertheless essential from a mathematical perspective.  This means there appears to be an unresolved conflict between concepts and mathematical rigor. But there might be a way forward. Perhaps it’s wrong to simply think in this dichotomy of logical rigor v. the messy reality of human concepts. One of the ways out of this impasse is to get beyond this dichotomy. Although it’s tempting to think of the technical and mathematical dimension on one side, with the social and political aspect on the other, we should instead see them as intricately related. They are, Kroll suggests, socio-technical problems. Kroll goes on to say that we cannot ignore the social consequences of machine learning: “Technologies don’t live in a vacuum and if we pretend that they do we kind of have put our blinders on and decided to ignore any human problems.” Fairness in machine learning In the real world, fairness is a concept directly linked to processes. Think, for example, of the voting system. Citizens cast votes to their preferred candidates and the candidate who receives the most support is elected. Here, we can say that even though the winning candidate was not the one a candidate voted for, but at least he/she got the chance to participate in the process. This type of fairness is called procedural fairness. However, in the technical world, fairness is often viewed in a subtly different way. When you place it in a mathematical context, fairness centers on outcome rather than process. Kohli highlighted that trade offs between these different concepts can’t be avoided. They’re inevitable. A mathematical definition of fairness places a constraint over the behavior of a system, and this constraint will narrow down the cause of models that can satisfy these conditions. So, if we decide to add too many fairness constraints to the system, some of them will be self-contradictory. One more important point machine learning practitioners should keep in mind is that when we talk about the fairness of a system, that system isn’t a self-contained and coherent thing. It is not a logical construct - it’s a social one. This means there are a whole host of values, ideas, and histories that have an impact on its reality.. In practice, this ultimately means that the complexity of the real world from which we draw and analyze data can have an impact on how a model works. Kohli explained this by saying, “it doesn’t really matter... whether you are building a fair system if the context in which it is developed and deployed in is fundamentally unfair.” Accountability in machine learning Accountability is ultimately about trust. It’s about the extent you can be sure you know what is ‘true’ about a system. It refers to the fact that you know how it works and why it does things in certain ways. In more practical terms, it’s all about invariance and reliability. To ensure accountability inside machine learning models, we need to follow a layered model. The bottom layer is an accounting or recording layer, that keeps track of what a given system is doing and the ways in which it might have been changed.. The next layer is a more analytical layer. This is where those records on the bottom layer are analyzed, with decisions made about performance - whether anything needs to be changed and how they should be changed. The final and top-most layer is about responsibility. It’s where the proverbial buck stops - with those outside of the algorithm, those involved in its construction. “Algorithms are not responsible, somebody is responsible for the algorithm,”  explains Kroll. Transparency Transparency is a concept heavily tied up with accountability. Arguably you have no accountability without transparency. The layered approach discussed above should help with transparency, but it’s also important to remember that transparency is about much more than simply making data and code available. Instead, it demands that the decisions made in the development of the system are made available and clear too. Mr. Kroll emphasizes, “to the person at the ground-level for whom the decisions are being taken by some sort of model, these technical disclosures aren’t really useful or understandable.” Explainability In his paper Explanation in Artificial Intelligence: Insights from the Social Sciences, Tim Miller describes what is explainable artificial intelligence. According to Miller, explanation takes many forms such as causal, contrastive, selective, and social. Causal explanation gives reasons behind why something happened, for example, while contrastive explanations can provide answers to questions like“Why P rather than not-P?". But the most important point here is that explanations are selective. An explanation cannot include all reasons why something happened; explanations are always context-specific, a response to a particular need or situation. Think of it this way: if someone asks you why the toaster isn’t working, you could just say that it’s broken. That might be satisfactory in some situations, but you could, of course, offer a more substantial explanation, outlining what was technically wrong with the toaster, how that technical fault came to be there, how the manufacturing process allowed that to happen, how the business would allow that manufacturing process to make that mistake… you could, of course, go on and on. Data is not the truth Today, there is a huge range of datasets available that will help you develop different machine learning models. These models can be useful, but it’s essential to remember that they are models. A model isn’t the truth - it’s an abstraction, a representation of the world in a very specific way. One way of taking this fact into account is the concept of ‘construct validity’. This sounds complicated, but all it really refers to is the extent to which a test - say a machine learning algorithm - actually measures what it says it’s trying to measure. The concept is widely used in disciplines like psychology, but in machine learning, it simply refers to the way we validate a model based on its historical predictive accuracy. In a nutshell, it’s important to remember that just as data is an abstraction of the world, models are also an abstraction of the data. There’s no way of changing this, but having an awareness that we’re dealing in abstractions ensures that we do not lapse into the mistake of thinking we are in the realm of ‘truth’. To build a fair(er) systems will ultimately require an interdisciplinary approach, involving domain experts working in a variety of fields. If machine learning and artificial intelligence is to make a valuable and positive impact in fields such as justice, education, and medicine, it’s vital that those working in those fields work closely with those with expertise in algorithms. This won’t fix everything, but it will be a more robust foundation from which we can begin to move forward. You can watch the full talk on the Facebook page of NeurIPS. Researchers unveil a new algorithm that allows analyzing high-dimensional data sets more effectively, at NeurIPS conference Accountability and algorithmic bias: Why diversity and inclusion matters [NeurIPS Invited Talk] NeurIPS 2018: A quick look at data visualization for Machine learning by Google PAIR researchers [Tutorial]
Read more
  • 0
  • 0
  • 17388
article-image-neurips-2018-developments-in-machine-learning-through-the-lens-of-counterfactual-inference-tutorial
Savia Lobo
15 Dec 2018
7 min read
Save for later

NeurIPS 2018: Developments in machine learning through the lens of Counterfactual Inference [Tutorial]

Savia Lobo
15 Dec 2018
7 min read
The 32nd NeurIPS Conference kicked off on the 2nd of December and continued till the 8th of December in Montreal, Canada. This conference covered tutorials, invited talks, product releases, demonstrations, presentations, and announcements related to machine learning research. “Counterfactual Inference” is one such tutorial presented during the NeurIPS by Susan Athey, The Economics of Technology Professor at the Stanford Graduate School of Business. This tutorial reviewed the literature that brings together recent developments in machine learning with methods for counterfactual inference. It will focus on problems where the goal is to estimate the magnitude of causal effects, as well as to quantify the researcher’s uncertainty about these magnitudes. She starts by mentioning that there are two sets of issues make causal inference must know concepts for AI. Some gaps between what we are doing in our research, and what the firms are applying. There are success stories such as Google images and so on. However, the top tech companies also do not fully adopt all the machine learning / AI concepts fully. If a firm dumps their old simple regression credit scoring model and makes use of a black box based on ML, are they going to worry what’s going to happen when they use the Black Box algorithm? According to Susan, the reason why firms and economists historically use simple models is that just by looking at the data it is difficult to understand whether the approach used is right. Whereas, using a Black box algorithm imparts some of the properties such as Interpretability, which helps in reasoning about the correctness of the approach. This helps researchers to make improvements in the model. Secondly, stability and robustness are also important for applications. Transfer learning helps estimate the model in one setting and use the same learning in some other setting. Also, these models will show fairness as many aspects of discrimination relates to correlation vs. causation. Finally, machine learning imparts a Human-like AI behavior that gives them the ability to make reasonable and never seen before decisions. All of these desired properties can be obtained in a causal model. The Causal Inference Framework In this framework, the goal is to learn a model of how the world works. For example, what happens to a body while a drug enters. Impact of intervention can be context specific. If a user learns something in a particular setting but it isn't working well in the other setting, it is not a problem with the framework. It’s, however, hard to do causal inference, there are some challenges including: We do not have the right kind of variation in the data. Lack of quasi-experimental data for estimation Unobserved contexts/confounders or insufficient data to control for observed confounders Analyst’s lack of knowledge about model Prof. Athey explains the true AI algorithm by using an example of contextual bandit under which there might be different treatments. In this example, one can select among alternative choices. They must have an explicit or implicit model of payoffs from alternatives. They also learn from past data. Here, the initial stages of learning have limited data, where there is a statistician inside the AI which performs counterfactual reasoning. A statistician should use best performing techniques (efficiency, bias). Counterfactual Inference Approaches Approach 1: Program Evaluation or Treatment Effect Estimation The goal of this approach is to estimate the impact of an intervention or treatment assignment policies. This literature focuses mainly on low dimensional interventions. Here, the estimands or the things that people want to learn is the average effect (Did it work?). For more sophisticated projects, people seek the heterogeneous effect (For whom did it work?) and optimal policy (policy mapping of people’s behavior to their assignments). The main goal here is to set confidence intervals around these effects to avoid bias or noisy sampling. This literature focuses on design that enables identification and estimation of these effects without using randomized experiments. Some of the designs include Regression discontinuity, difference-in-difference, and so on. Approach 2: Structural Estimation or ‘Generative models and counterfactuals’ Here the goal is to impact on welfare/profits of participants in alternative counterfactual regimes. These regimes may not have ever been observed in relevant contexts. These also need a behavioral model of participants. One can make use of Dynamic structural models to learn about value function from agent choices in different states. Approach 3: Causal discovery The goal of this approach is to uncover the causal structure of a system. Here the analyst believes that there is an underlying structure where some variables are causes of others, e.g. a physical stimulus leads to biological responses. Application of this can be found in understanding software systems and biological systems. [box type="shadow" align="" class="" width=""]Recent literature brings causal reasoning, statistical theory, and modern machine learning algorithms together to solve important problems. The difference between supervised learning and causal inference is that supervised learning can evaluate in a test set in a model‐free way. In causal inference, parameter estimation is not observed in a test set. Also, it requires theoretical assumptions and domain knowledge. [/box] Estimating ATE (Average Treatment Effects) under unconfoundedness Here only the observational data is available and only an analyst has access to the data that is sufficient for the part of the information used to assign units to treatments that is related to potential outcomes. The speaker here has used an example of how online Ads are targeted using cookies. The user sees car ads because the advertiser knows that the user has visited car reviewer websites. Here the purchases cannot be related to users who saw an ad versus the ones who did not. Hence, the interest in cars is the unobserved confounder. However, the analyst can see the history of the websites visited by the user. This is the main source of information for the advertiser about user interests. Using Supervised ML to measure estimate ATE under unconfoundedness The first supervised ML method is propensity score weighting or KNN on propensity score. For instance, make use of the LASSO regression model to estimate the propensity score. The second method is Regression adjustment which tries to estimate the further outcomes or access the features of further outcomes to get a causal effect. The next method is estimating CATE (Conditional average treatment effect) and take averages using the BART model. The method mentioned by Prof. Athey here is, Double robust/ double machine learning which uses cross-fitted augmented inverse propensity scores. Another method she mentioned was Residual Balancing which avoids assuming a sparse model thus allowing applications with a complex assignment. If unconfoundedness fails, the alternate assumption: there exists an instrumental variable Zi that is correlated with Wi (“relevance”) and where: Structural Models Structural models enable counterfactuals for never‐seen worlds. Combining Machine learning with structural model provides attention to identification, estimation using “good” exogenous variation in data. Also, adding a sensible structure improves performance required for never‐seen counterfactuals, increased efficiency for sparse data (e.g. longitudinal data) Nature of structure includes: Learning underlying preferences that generalize to new situations Incorporating nature of choice problem Many domains have established setups that perform well in data‐poor environments With the help of Discrete Choice Model, users can evaluate the impact of a new product introduction or the removal of a product from choice set. On combining these Discrete Choice Models with ML, we have two approaches to product interactions: Use information about product categories, assume products substitutes within categories Do not use available information about categories, estimate subs/complements Susan has concluded by mentioning some of the challenges on Causal inference, which include data sufficiency, finding sufficient/useful variation in historical data. She also mentions that recent advances in computational methods in ML don’t help with this. However, tech firms conducting lots of experiments, running bandits, and interacting with humans at large scale can greatly expand the ability to learn about causal effects! Head over to the Susan Athey’s entire tutorial on Counterfactual Inference at NeurIPS Facebook page. Researchers unveil a new algorithm that allows analyzing high-dimensional data sets more effectively, at NeurIPS conference Accountability and algorithmic bias: Why diversity and inclusion matters [NeurIPS Invited Talk] NeurIPS 2018: A quick look at data visualization for Machine learning by Google PAIR researchers [Tutorial]
Read more
  • 0
  • 0
  • 17477

article-image-kelsey-hightower-on-serverless-and-security-on-kubernetes-at-kubecon-cloudnativecon
Prasad Ramesh
14 Dec 2018
4 min read
Save for later

Kelsey Hightower on Serverless and Security on Kubernetes at KubeCon + CloudNativeCon

Prasad Ramesh
14 Dec 2018
4 min read
In a stream hosted earlier this week by The New Stack, Kelsey Hightower, developer advocate, Google Cloud Platform, talked about the serverless and security aspects of Kubernetes. The stream was from KubeCon + CloudNativeCon 2018. What are you exploring right now with respect to serverless? There are many managed services these days. Database, security etc is fully managed i.e., serverless. People have been on this trajectory for a while if you consider DNS, email, and even Salesforce. Now we have serverless since managed services are ‘eating that world as well’. That world being the server side world and related workloads. How are managed services eating the server side world? If someone has to run and build an API, one approach would be to use Kubernetes and manage the cluster and build the container, run it on Kubernetes and manage that. Even if it a fully managed cluster, you may still have to manage the things around Kubernetes. Another approach is to deal with a higher form of extraction. Serverless is coupled often with FaaS (Function as a Service). There are a lot of abstractions in terms of resources, i.e., resources are abstracted more these days. Hightower talks about a test: “If I walk up to a platform and the delta between me and my code is short, you’re probably closer to the serverless mindset.” This is different from creating a VM, then installing something, configuring something, and then running some code—this is not really serverless. Serverless in a Kubernetes context The point of view should be—can we improve the experience on Kubernetes by adopting some things from serverless? You can add a layer that does functions, so developers can stop worrying about containers and focus on the source. The big picture is—who autoscales the whole cluster? Kubernetes and just an additional layer can’t really be called serverless but it is going in that direction. Over time, if you do enough so that people don’t have to think about or even know that Kubernetes is there, you’re getting closer to being truly serverless. Security in Kubernetes Hightower loves the granular controls of serverless technologies. Comparing the serverless security model to other models For a long time in the industry, companies have been trying to do a least privilege approach. That is, limiting the access of applications so that it can perform only a specific action that is required. So if one server is compromised and it does not have access to anything else, then the effects are isolated. The Kubernetes approach can be different. The cloud providers try to make sure that all the credentials needed to do important things are segmented from VM, cloud functions, app engine or Kubernetes. Imagine if Kubernetes is where everything lives free. Instead of one machine being taken down, it is now easier for the whole cluster to be taken down in one shot. This is called ‘broadening the blast radius’. If you have Kubernetes and you give it keys to everything in your cluster, then everything is compromised when the Kubernetes API is compromised. Having just one cluster trades off on security. Another approach to serverless security A different security model is where you explicitly give credentials that may be needed. So there is no scope to ask for any credentials etc, it will not be allowed. You can also go wrong on a serverless but the system is better defined in ways that it limits what can be done. It’s easier to secure when the attack surface is smaller. For serverless security the same principles from engineering techniques apply, you just have to apply it to these new platforms. So you just need knowledge about what these new platforms are doing. The same principles apply, admins just have a different layer of abstraction that they may add some additional security to. The more people use the system, more flaws are continuously found. It takes a community to identify flaws and patch them. So as a community is more mature, dedicated security researchers come up and patch flaws before they can be exploited. To see the complete talk where Hightower talks about his views on what he is working on, go to The New Stack YouTube Channel. DigitalOcean launches its Kubernetes-as-a-service at KubeCon+CloudNativeCon to ease running containerized apps Elastic launches Helm Charts (alpha) for faster deployment of Elasticsearch and Kibana to Kubernetes NeuVector upgrades Kubernetes container security with the release of Containerd and CRI-O run-time support
Read more
  • 0
  • 0
  • 16630

article-image-key-takeaways-from-sundar-pichais-congress-hearing-over-user-data-political-bias-and-project-dragonfly
Natasha Mathur
14 Dec 2018
12 min read
Save for later

Key Takeaways from Sundar Pichai’s Congress hearing over user data, political bias, and Project Dragonfly

Natasha Mathur
14 Dec 2018
12 min read
Google CEO, Sundar Pichai testified before the House Judiciary Committee earlier this week. The hearing titled “Transparency & Accountability: Examining Google and its Data Collection, Use, and Filtering Practices” was a three-and-a-half-hour question-answer session that centered mainly around user data collection at Google, allegations of political bias in its search algorithms, and Google’s controversial plans with China. “All of these topics, competition, censorship, bias, and others..point to one fundamental question that demands the nation’s attention. Are America’s technology companies serving as instruments of freedom or instruments of control?,” said Representative Kevin McCarthy of California, the House Republican leader. The committee members could have engaged with Pichai on more important topics had they not been busy focussing on opposing each other’s opinions over whether Google search and its other products are biased against conservatives. Also, most of Pichai’s responses were unsatisfactory as he cleverly dodged questions regarding its Project Dragonfly and user data. Here are the key highlights from the testimony. Allegations of Political Bias One common theme throughout the long hearing session was Republicans asking questions based around alleged bias against conservatives on Google's platforms. Google search Bias Rep. Lamar Smith asked questions regarding the alleged political bias that is “imbibed” in Google’s search algorithms and its culture. Smith presented an example of a study by Robert Epstein, a Harvard trained psychologist. As per the study’s results, Google’s search bias likely swung 2.6 million votes to Hillary Clinton during the 2016 elections. To this Pichai’s reply was that Google has investigated some of the studies including the one by Dr. Epstein, and found that there were issues with the methodology and its sample size. He also mentioned how Google evaluates their search results for accuracy by using a “robust methodology” that it has been using for the past 20 years. Pichai also added that “providing users with high quality, accurate, and trusted information is sacrosanct to us. It’s what our principles are and our business interests and our natural long-term incentives are aligned with that. We need to serve users everywhere and we need to earn their trust in order to do so.” Google employees’ bias, the reason for biased search algorithms, say Republicans Smith also presented examples of pro-Trump content and immigration laws being tagged as hate speech on Google search results posing threat to the democratic form of government. He also alleged that people at Google were biased and intentionally transferred their biases into these search algorithms to get the results they want and management allows it. Pichai clarified that Google doesn't manually intervene on any particular search result. “Google doesn’t choose conservative voices over liberal voices. There’s no political bias and Google operates in a neutral way,” added Pichai. Would Google allow an independent third party to study its search results to determine the degree of political bias? Pichai responded to this question saying that they already have third parties that are completely independent and haven’t been appointed by Google in place for evaluating its search algorithms. “We’re transparent as to how we evaluate our search. We publish our rater guidelines. We publish it externally and raters evaluate it, we’re trying hard to understand what users want and this is what we think is right. It’s not possible for an employee or a group of employees to manipulate our search algorithm”. Political advertising bias The Committee Chairman Bob Goodlatte, a Republican from Virginia also asked Pichai about political advertising bias on Google’s ad platforms that offer different rates for different political candidates to reach prospective voters. This is largely different than how other competitive media platforms like TV and radio operate - offering the lowest rate to all political candidates. He asked if Google should charge the same effective ad rates to political candidates. Pichai explained that their advertising products are built without any bias and the rates are competitive and set by a live auction process. The prices are calculated automatically based on the keywords that you’re bidding for, and on the demand in the auction. There won’t be a difference in rates based on any political reasons unless there are keywords that are of particular interest. He referred the whole situation to a demand-supply equilibrium, where the rates can differ but that will vary from time to time. There could be occasions when there is a substantial difference in rates based on the time of the day, location, how keywords are chosen etc, and it’s a process that Google has been using for over 20 years. Pichai further added that “anything to do with the civic process, we make sure to do it in a non-partisan way and it's really important for us”. User data collection and security Another highlight of the hearing was Google’s practices around user data collection and security. “Google is able to collect an amount of information about its users that would even make the NSA blush. Americans have no idea the sheer volume of information that is collected”, said Goodlatte. Location tracking data related privacy concerns During Mr. Pichai’s testimony, the first question from Rep. Goodlatte was about whether consumers understand the frequency and amount of location data that Google collects from its Android operating system. Goodlatte asked Pichai about the collection of location data and apps running on Android. To this Pichai replied that Google offers users controls for limiting location data collection. “We go to great lengths to protect their privacy, we give them transparency, choice, and control,” says Pichai. Pichai highlighted that Android is a powerful smartphone that offers services to over 2 billion people. User data that is collected via Android depends on the applications that users choose to use. He also pointed out that Google makes it very clear to its users about what information is collected. He pointed out that there are terms of service and also a “privacy checkup”. Going to  “my account” settings on Gmail gives you a clear picture of what user data they have. He also says that users can take that data to other platforms if they choose to stop using Google. On Google+ data breach Another Rep. Jerrold Nadler talked about the recent Google plus data breach that affected some 52.5 million users. He asked Pichai about the legal obligations that the company is under to publicly expose the security issues. Pichai responded to this saying that Google “takes privacy seriously,” and that Google needs to alert the users and the necessary authorities of any kind of data breach or bugs within 72 hours. He also mentions "building software inevitably has bugs associated as part of the process”.  Google undertakes a lot of efforts to find bugs and the root cause of it, and make sure to take care of it. He also says how they have advanced protection in Gmail to offer a stronger layer of security to its users. Google’s commitment to protecting U.S. elections from foreign interference It was last year when Google discovered that Russian operatives spent tens of thousands of dollars on ads on its YouTube, Gmail and Google Search products in an effort to meddle in the 2016 US presidential election. “Does Google now know the full extent to which its online platforms were exploited by Russian actors in the election 2 years ago?” asked Nadler. Pichai responded that Google conducted a thorough investigation in 2016. It found out that there were two main ads accounts linked to Russia which advertised on google for about 4700 dollars in advertising. “We found a limited activity, improper activity, we learned from that and have increased the protections dramatically we have around our elections offering”, says Pichai. He also added that to protect the US elections, Google will do a significant review of how ads are bought, it will look for the origin of these accounts, share and collaborate with law enforcement, and other tech companies. “Protecting our elections is foundational to our democracy and you have my full commitment that we will do that,” said Pichai. Google’s plans with China Rep. Sheila Jackson Lee was the first person to directly ask Pichai about the company’s Project Dragonfly i.e. its plans of building a censored search engine with China. “We applauded you in 2010 when Google took a very powerful stand principle and democratic values over profits and came out of China,” said Jackson. Other who asked Pichai regarding Google's China plans were Rep. Tom Marino and Rep. David Cicilline. Google left China in 2010 because of concerns regarding hacking, attacks, censorship, and how the Chinese government was gaining access to its data. How is working with the Chinese govt to censor search results a part of Google’s core values? Pichai repeatedly said that Google has no plans currently to launch in China. “We don't have a search product there. Our core mission is to provide users with access to information and getting access to information is an important right (of users) so we try hard to provide that information”, says Pichai. He added that Google always has evidence based on every country that it has operated in. “Us reaching out and giving users more information has a very positive impact and we feel that calling but right now there are no plans to launch in China,” says Pichai. He also mentioned that if Google ever approaches a decision like that he’ll be fully transparent with US policymakers and “engage in consult widely”. He further added that Google only provides Android services in China for which it has partners and manufacturers all around the world. “We don't have any special agreements on user data with the Chinese government”, said Pichai.  On being asked by Rep. Marino about a report from The Intercept that said Google created a prototype for a search engine to censor content in China, Pichai replied, “we designed what a search could look like if it were to be launched in a country like China and that’s what we explored”. Rep. Cicilline asked Pichai whether any employees within Google are currently attending product meetings on Dragonfly. Pichai replied evasively saying that Google has “undertaken an internal effort, but right now there are no plans to launch a search service in China necessarily”. Cicilline shot another question at Pichai asking if Google employees are talking to members of the Chinese government, which Pichai dodged by responding with "Currently we are not in discussions around launching a search product in China," instead. Lastly, when Pichai was asked if he would rule out "launching a tool for surveillance and censorship in China”, he replied that Google’s mission is providing users with information, and that “we always think it’s in our duty to explore possibilities to give users access to information. I have a commitment, but as I’ve said earlier we’ll be very thoughtful and we’ll engage widely as we make progress”. On ending forced arbitration for all forms of discrimination Last month 20,000 Google employees along with Temps, Vendors, and Contractors walked out of their respective Google offices to protest discrimination and sexual harassment in the workplace. As part of the walkout, Google employees laid out five demands urging Google to bring about structural changes within the workplace. One of the demands was ending forced arbitration meaning that Google should no longer require people to waive their right to sue. Also, that every co-worker should have the right to bring a representative, or supporter of their choice when meeting with HR for filing a harassment claim. Rep. Pramila Jayapal asked Pichai if he can commit to expanding the policy of ending forced arbitration for any violation of an employee’s (also contractors) right not just sexual harassment. To this Pichai replied that Google is currently definitely looking into this further. “It’s an area where I’ve gotten feedback personally from our employees so we’re currently reviewing what we could do and I’m looking forward to consulting, and I’m happy to think about more changes here. I’m happy to have my office follow up to get your thoughts on it and we are definitely committed to looking into this more and making changes”, said Pichai. Managing misinformation and hate speech During the hearing, Pichai was questioned about how Google is handling misinformation and hate speech. Rep. Jamie Raskin asked why videos promoting conspiracy theory known as “Frazzledrip,” ( Hillary Clinton kills young women and drinks their blood) are still allowed on YouTube. To this Pichai responded with, “We would need to validate whether that specific video violates our policies”. Rep. Jerry Nadler also asked Pichai about Google’s actions to "combat white supremacy and right-wing extremism." Pichai said Google has defined policies against hate speech and that if Google finds violations, it takes down the content. “We feel a tremendous sense of responsibility to moderate hate speech, define hate speech clearly inciting violence or hatred towards a group of people. It's absolutely something we need to take a strict line on. We’ve stated our policies strictly and we’re working hard to make our enforcement better and we’ve gotten a lot better but it's not enough so yeah we’re committed to doing a lot more here”, said Pichai. Our Take Hearings between tech companies and legislators, in the current form, are an utter failure. In addition to making tech reforms, there is an urgent need to also make reforms in how policy hearings are conducted. It is high time we upgraded ourselves to the 21st century. These were the key highlights of the hearing held on 11th December 2018. We recommend you watch the complete hearing for a more comprehensive context. As Pichai defends Google’s “integrity” ahead of today’s Congress hearing, over 60 NGOs ask him to defend human rights by dropping Drag Google bypassed its own security and privacy teams for Project Dragonfly reveals Intercept Google employees join hands with Amnesty International urging Google to drop Project Dragonfly
Read more
  • 0
  • 0
  • 25368
article-image-the-cruelty-of-algorithms-heartbreaking-open-letter-criticizes-tech-companies-for-showing-baby-ads-after-stillbirth
Bhagyashree R
13 Dec 2018
3 min read
Save for later

The cruelty of algorithms: Heartbreaking open letter criticizes tech companies for showing baby ads after stillbirth

Bhagyashree R
13 Dec 2018
3 min read
2018 has thrown up a huge range of examples of the unintended consequences of algorithms. From the ACLU’s research in July which showed how the algorithm in Amazon’s facial recognition software incorrectly matched images of congress members with mugshots, to the same organization’s sexist algorithm used in the hiring process, this has been a year where the damage that algorithms can cause has become apparent. But this week, an open letter by Gillian Brockell, who works at The Washington Post, highlighted the traumatic impact algorithmic personalization can have. In it, Brockell detailed how personalized ads accompanied her pregnancy, and speculated how the major platforms that dominate our digital lives. “...I bet Amazon even told you [the tech companies to which the letter is addressed] my due date… when I created an Amazon registry,” she wrote. But she went on to explain how those very algorithms were incapable of processing the tragic death of her unborn baby, blind to the grief that would unfold in the aftermath. “Did you not see the three days silence, uncommon for a high frequency user like me”. https://twitter.com/STFUParents/status/1072759953545416706 But Brockell’s grief was compounded by the way those companies continued to engage with her through automated messaging. She explained that although she clicked the “It’s not relevant to me” option those ads offer users, this only led algorithms to ‘decide’ that she had given birth, offering deals on strollers and nursing bras. As Brockell notes in her letter, stillbirths aren’t as rare as many think, with 26,000 happening in the U.S. alone every year. This fact only serves to emphasise the empathetic blind spots in the way algorithms are developed. “If you’re smart enough to realize that I’m pregnant, that I’ve given birth, then surely you’re smart enough to realize my baby died.” Brockell’s open letter garnered a lot of attention on social media, to such an extent that a number of the companies at which Brockell had directed her letter responded. Speaking to CNBC, a Twitter spokesperson said, “We cannot imagine the pain of those who have experienced this type of loss. We are continuously working on improving our advertising products to ensure they serve appropriate content to the people who use our services.” Meanwhile, a Facebook advertising executive, Rob Goldman responded, “I am so sorry for your loss and your painful experience with our products.” He also explained how these ads could be blocked. “We have a setting available that can block ads about some topics people may find painful — including parenting. It still needs improvement, but please know that we’re working on it & welcome your feedback.” Experian did not respond to requests for comment. However, even after taking Goldman’s advice, Brockell revealed she was then shown adoption adverts: https://twitter.com/gbrockell/status/1072992972701138945 “It crossed the line from marketing into Emotional Stalking,” said one Twitter user. While the political impact of algorithms has led to sustained commentary and criticism in 2018, this story reveals the personal impact algorithms can have. It highlights that as artificial intelligence systems become more and more embedded in everyday life, engineers will need an acute sensitivity and attention to detail to the potential use cases and consequences that certain algorithms may have. You can read Brockell’s post on Twitter. Facebook’s artificial intelligence research team, FAIR, turns five. But what are its biggest accomplishments? FAT Conference 2018 Session 3: Fairness in Computer Vision and NLP FAT Conference 2018 Session 4: Fair Classification
Read more
  • 0
  • 0
  • 13738

article-image-deep-learning-indaba-presents-the-state-of-natural-language-processing-in-2018
Sugandha Lahoti
12 Dec 2018
5 min read
Save for later

Deep Learning Indaba presents the state of Natural Language Processing in 2018

Sugandha Lahoti
12 Dec 2018
5 min read
The ’Strengthening African Machine Learning’ conference organized by Deep Learning Indaba, at Stellenbosch, South Africa, is ongoing right now. This 6-day conference will celebrate and strengthen machine learning in Africa through state-of-the-art teaching, networking, policy debate, and through support programmes. Yesterday, three conference organizers, Sebastian Ruder, Herman Kamper, and Stephan Gouws asked tech experts their view on the state of Natural Language Processing, more specifically these 4 questions: What do you think are the three biggest open problems in Natural Language Processing at the moment? What would you say is the most influential work in Natural Language Processing in the last decade, if you had to pick just one? What, if anything, has led the field in the wrong direction? What advice would you give a postgraduate student in Natural Language Processing starting their project now? The tech experts interviewed included the likes of Yoshua Bengio, Hal Daumé III, Barbara Plank, Miguel Ballesteros, Anders Søgaard, Lea Frermann, Michael Roth, Annie Louise, Chris Dyer, Felix Hill,  Kevin Knight and more. https://twitter.com/seb_ruder/status/1072431709243744256 Biggest open problems in Natural Language Processing at the moment Although each expert talked about a variety of Natural Language Processing open issues, the following common key themes recurred. No ‘real’ understanding of Natural language understanding Many experts argued that natural Language understanding is central and also important for natural language generation. They agreed that most of our current Natural Language Processing models do not have a “real” understanding. What is needed is to build models that incorporate common sense, and what (biases, structure) should be built explicitly into these models. Dialogue systems and chatbots were mentioned in several responses. Maletšabisa Molapo, a Research Scientist at IBM Research and one of the experts answered, “Perhaps this may be achieved by general NLP Models, as per the recent announcement from Salesforce Research, that there is a need for NLP architectures that can perform well across different NLP tasks (machine translation, summarization, question answering, text classification, etc.)” NLP for low-resource scenarios Another open problem is using NLP for low-resource scenarios. This includes generalization beyond the training data, learning from small amounts of data and other techniques such as Domain-transfer, transfer learning, multi-task learning. Also includes different supervised learning techniques, semi-supervised, weakly-supervised, “Wiki-ly” supervised, distantly-supervised, lightly-supervised, minimally-supervised and unsupervised learning. Per Karen Livescu, Associate Professor Toyota Technological Institute at Chicago, “Dealing with low-data settings (low-resource languages, dialects (including social media text "dialects"), domains, etc.).  This is not a completely "open" problem in that there are already a lot of promising ideas out there; but we still don't have a universal solution to this universal problem.” Reasoning about large or multiple contexts Experts believed that NLP has problems in dealing with large contexts. These large context documents can be either text or spoken documents, which currently lack common sense incorporation. According to, Isabelle Augenstein, tenure-track assistant professor at the University of Copenhagen, “Our current models are mostly based on recurrent neural networks, which cannot represent longer contexts well. One recent encouraging work in this direction I like is the NarrativeQA dataset for answering questions about books. The stream of work on graph-inspired RNNs is potentially promising, though has only seen modest improvements and has not been widely adopted due to them being much less straight-forward to train than a vanilla RNN.” Defining problems, building diverse datasets and evaluation procedures “Perhaps the biggest problem is to properly define the problems themselves. And by properly defining a problem, I mean building datasets and evaluation procedures that are appropriate to measure our progress towards concrete goals. Things would be easier if we could reduce everything to Kaggle style competitions!” - Mikel Artetxe. Experts believe that current NLP datasets need to be evaluated. A new generation of evaluation datasets and tasks are required that show whether NLP techniques generalize across the true variability of human language. Also what is required are more diverse datasets. “Datasets and models for deep learning innovation for African Languages are needed for many NLP tasks beyond just translation to and from English,” said Molapo. Advice to a postgraduate student in NLP starting their project Do not limit yourself to reading NLP papers. Read a lot of machine learning, deep learning, reinforcement learning papers. A PhD is a great time in one’s life to go for a big goal, and even small steps towards that will be valued. — Yoshua Bengio Learn how to tune your models, learn how to make strong baselines, and learn how to build baselines that test particular hypotheses. Don’t take any single paper too seriously, wait for its conclusions to show up more than once. — George Dahl I believe scientific pursuit is meant to be full of failures. If every idea works out, it’s either because you’re not ambitious enough, you’re subconsciously cheating yourself, or you’re a genius, the last of which I heard happens only once every century or so. so, don’t despair! — Kyunghyun Cho Understand psychology and the core problems of semantic cognition. Understand machine learning. Go to NeurIPS. Don’t worry about ACL. Submit something terrible (or even good, if possible) to a workshop as soon as you can. You can’t learn how to do these things without going through the process. — Felix Hill Make sure to go through the complete list of all expert responses for better insights. Google open sources BERT, an NLP pre-training technique Use TensorFlow and NLP to detect duplicate Quora questions [Tutorial] Intel AI Lab introduces NLP Architect Library  
Read more
  • 0
  • 0
  • 16456
Modal Close icon
Modal Close icon