News | 0 articles | Tech News, Tutorials & Expert Insights

article-image-a-new-study-reveals-how-shopping-websites-use-dark-patterns-to-deceive-you-into-buying-things-you-may-not-want

26 Jun 2019

6 min read

A new study reveals how shopping websites use ‘dark patterns’ to deceive you into buying things you may not want

26 Jun 2019

A new study by researchers from Princeton University and the University of Chicago suggests that shopping websites are abundant with dark patterns that rely on consumer deception. The researchers conducted a large-scale study, analyzing almost 53K product pages from 11K shopping websites to characterize and quantify the prevalence of dark patterns. They discovered 1,841 instances of dark patterns on shopping websites, which together represent 15 types of dark patterns. Note: All images in the article are taken from the research paper. What are dark patterns Dark patterns are generally used by shopping websites as a part of their user interface design choices. These dark patterns coerce, steer, or deceive users into making unintended and potentially harmful decisions, benefiting an online service. Shopping websites trick users into signing up for recurring subscriptions and making unwanted purchases, resulting in concrete financial loss. These patterns are not just limited to shopping websites, and find common applications on digital platforms including social media, mobile apps, and video games as well. At extreme levels, dark patterns can lead to financial loss, tricking users into giving up vast amounts of personal data, or inducing compulsive and addictive behavior in adults and children. Researchers used a web crawler to identify text-based dark patterns The paper uses an automated approach that enables researchers to identify dark patterns at scale on the web. The researchers crawled 11K shopping websites using a web crawler, built on top of OpenWPM, which is a web privacy measurement platform. The web crawler was used to simulate a user browsing experience and identify user interface elements. The researchers used text clustering to extract recurring user interface designs from the resulting data and then inspected the resulting clusters for instances of dark patterns. The researchers also developed a novel taxonomy of dark pattern characteristics to understand how dark patterns influence user decision-making. Based on the taxonomy, the dark patterns were classified basis whether they lead to an asymmetry of choice, are covert in their effect, are deceptive in nature, hide information from users, and restrict choice. The researchers also mapped the dark patterns in their data set to the cognitive biases they exploit. These biases collectively described the consumer psychology underpinnings of the dark patterns identified. They also determine that many instances of dark patterns are enabled by third-party entities, which provide shopping websites with scripts and plugins to easily implement these patterns on their websites. Key stats from the research There are 1,841 instances of dark patterns on shopping websites, which together represent 15 types of dark patterns and 7 broad categories. These 1,841 dark patterns were present on 1,267 of the 11K shopping websites (∼11.2%) in their data set. Shopping websites that were more popular, according to Alexa rankings, were more likely to feature dark patterns. 234 instances of deceptive dark patterns were uncovered across 183 websites 22 third-party entities were identified that provide shopping websites with the ability to create dark patterns on their sites. Dark pattern categories Sneaking Attempting to misrepresent user actions. Delaying information that users would most likely object to once made available. Sneak into Basket: The “Sneak into Basket” dark pattern adds additional products to users’ shopping carts without their consent Hidden Subscription: Dark pattern charges users a recurring fee under the pretense of a one-time fee or a free trial Hidden Costs: Reveals new, additional, and often unusually high charges to users just before they are about to complete a purchase. Urgency Imposing a deadline on a sale or deal, thereby accelerating user decision-making and purchases. Countdown Timers: Dynamic indicator of a deadline counting down until the deadline expires. Limited-time Messages: Static urgency message without an accompanying deadline Misdirection Using visuals, language, or emotion to direct users toward or away from making a particular choice. Confirmshaming: It uses language and emotion to steer users away from making a certain choice. Trick Questions: It uses confusing language to steer users into making certain choices. Visual Interference: It uses style and visual presentation to steer users into making certain choices over others. Pressured Selling: It refers to defaults or often high-pressure tactics that steer users into purchasing a more expensive version of a product (upselling) or into purchasing related products (cross-selling). Social proof Influencing users' behavior by describing the experiences and behavior of other users. Activity Notification: Recurring attention grabbing message that appears on product pages indicating the activity of other users. Testimonials of Uncertain Origin: The use of customer testimonials whose origin or how they were sourced and created is not clearly specified. Scarcity Signalling that a product is likely to become unavailable, thereby increasing its desirability to users. Examples such as Low-stock Messages and High-demand Messages come under this category. Low-stock Messages: It signals to users about limited quantities of a product High-demand Messages: It signals to users that a product is in high demand, implying that it is likely to sell out soon. Obstruction Making it easy for the user to get into one situation but hard to get out of it. The researchers observed one type of the Obstruction dark pattern: “Hard to Cancel”. The Hard to Cancel dark pattern is restrictive (it limits the choices users can exercise to cancel their services). In cases where websites do not disclose their cancellation policies upfront, Hard to Cancel also becomes information hiding (it fails to inform users about how cancellation is harder than signing up). Forced Action Forcing the user to do something tangential in order to complete their task. The researchers observed one type of the Forced Action dark pattern: “Forced Enrollment” on 6 websites. Limitations of the research The researchers have acknowledged that their study has certain limitations. Only text-based dark patterns are taken into account for this study. There is still work needed to be done for inherently visual patterns (e.g., a change of font size or color to emphasize one part of the text more than another from an otherwise seemingly harmless pattern). The web crawling lead to a fraction of Selenium crashes, which did not allow researchers to either retrieve product pages or complete data collection on certain websites. The crawler failed to completely simulate the product purchase flow on some websites. They only crawled product pages and checkout pages, missing out on dark patterns present in other common pages such as the homepage of websites, product search pages, and account creation pages. The list of dark patterns can be downloaded as a CSV file. For more details, we recommend you to read the research paper. U.S. senators introduce a bipartisan bill that bans social media platforms from using ‘dark patterns’ to trick its users. How social media enabled and amplified the Christchurch terrorist attack Can an Open Web Index break Google’s stranglehold over the search engine market?

0
0
26285

article-image-datacamp-reckons-in-metoo-movement-ceo-steps-down-from-his-role-indefinitely

Fatema Patrawala

25 Apr 2019

7 min read

DataCamp reckons with its #MeToo movement; CEO steps down from his role indefinitely

Fatema Patrawala

25 Apr 2019

7 min read

The data science community is reeling after data science learning startup DataCamp penned a blog post acknowledging that an unnamed company executive made "uninvited physical contact" with one of its employees. DataCamp, which operates an e-platform where aspiring data scientists can take courses in coding and data analysis is a startup valued at $184 million. It has additionally raised over $30 million in funding. The company disclosed in a blog post published on 4th April that this incident occurred at an "informal employee gathering" at a bar in October 2017. The unnamed DataCamp executive had "danced inappropriately and made uninvited physical contact" with the employee on the dance floor, the post read. The company didn't name the executive involved in the incident in its post. But called the executive's behavior on the dance floor "entirely inappropriate" and "inconsistent" with employee expectations and policies. When Buisness Insider reached out to one of the course instructors OS Keyes familiar with this matter, Keyes said that the executive in question is DataCamp's co-founder and CEO Jonathan Cornelissen. Yesterday Motherboard also reported that the company did not adequately address sexual misconduct by a senior executive there and instructors at DataCamp have begun boycotting the service and asking the company to delete their courses following allegations. What actually happened and how did DataCamp respond? On April 4, DataCamp shared a statement on its blog titled “a note to our community.” In it, the startup addressed the accusations against one of the company’s executives: “In October 2017, at an informal employee gathering at a bar after a week-long company offsite, one of DataCamp’s executives danced inappropriately and made uninvited physical contact with another employee while on the dance floor.” DataCamp got the complaint reviewed by a “third party not involved in DataCamp’s day-to-day business,” and said it took several “corrective actions,” including “extensive sensitivity training, personal coaching, and a strong warning that the company will not tolerate any such behavior in the future.” DataCamp only posted its blog a day after more than 100 DataCamp instructors signed a letter and sent it to DataCamp executives. “We are unable to cooperate with continued silence and lack of transparency on this issue,” the letter said. “The situation has not been acknowledged adequately to the data science community, leading to harmful rumors and uncertainty.” But as instructors read the statement from DataCamp following the letter, many found the actions taken to be insufficient. https://twitter.com/hugobowne/status/1120733436346605568 https://twitter.com/NickSolomon10/status/1120837738004140038 Motherboard reported this case in detail taking notes from Julia Silge, a data scientist who co-authored the letter to DataCamp. Julia says that going public with our demands for accountability was the last resort. Julia spoke about the incident in detail and says she remembered seeing the victim of the assault start working at DataCamp and then leave abruptly. This raised “red flags” but she did not reach out to her. Then Silge heard about the incident from a mutual friend and she began to raise the issue with internal people at DataCamp. “There were various responses from the rank and file. It seemed like after a few months of that there was not a lot of change, so I escalated a little bit,” she said. DataCamp finally responded to Silge by saying “I think you have misconceptions about what happened,” and they also mentioned that “there was alcohol involved” to explain the behavior of the executive. DataCamp further explained that “We also heard over and over again, ‘This has been thoroughly handled.’” But according to Silge and other instructors who have spoken out, say that DataCamp hasn’t properly handled the situation and has tried to sweep it under the rug. Silge also created a private Slack group to communicate and coordinate their efforts to confront this issue. She along with the group got into a group video conference with DataCamp, which was put into “listen-only” mode for all the other participants except DataCamp, meaning they could not speak in the meeting, and were effectively silenced. “It felt like 30 minutes of the DataCamp leadership saying what they wanted to say to us,” Silge said. “The content of it was largely them saying how much they valued diversity and inclusion, which is hard to find credible given the particular ways DataCamp has acted over the past.” Following that meeting, instructors began to boycott DataCamp more blatantly, with one instructor refusing to make necessary upgrades to her course until DataCamp addressed the situation. Silge and two other instructors eventually drafted and sent the letter, at first to the small group involved in accountability efforts, then to almost every DataCamp instructor. All told, the letter received more than 100 signatures (of about 200 total instructors). A DataCamp spokesperson said in response to this, “When we became aware of this matter, we conducted a thorough investigation and took actions we believe were necessary and appropriate. However, recent inquiries have made us aware of mischaracterizations of what occurred and we felt it necessary to make a public statement. As a matter of policy, we do not disclose details on matters like this, to protect the privacy of the individuals involved.” “We do not retaliate against employees, contractors or instructors or other members of our community, under any circumstances, for reporting concerns about behavior or conduct,” the company added. The response received from DataCamp was not only inadequate, but technologically faulty, as per one of the contractors Noam Ross who pointed out in his blog post that DataCamp had published the blog with a “no-index” tag, meaning it would not show up in aggregated searches like Google results. Thus adding this tag knowingly represents DataCamp’s continued lack of public accountability. OS Keyes said to Business Insider that at this point, the best course of action for DataCamp is a blatant change in leadership. “The investors need to get together and fire the [executive], and follow that by publicly explaining why, apologising, compensating the victim and instituting a much more rigorous set of work expectations,” Keyes said. #Rstats and other data science communities and DataCamp instructors take action One of the contractors Ines Montani expressed this by saying, “I was pretty disappointed, appalled and frustrated by DataCamp's reaction and non-action, especially as more and more details came out about how they essentially tried to sweep this under the rug for almost two years,” Due to their contracts, many instructors cannot take down their DataCamp courses. Instead of removing the courses, many contractors for DataCamp, including Montani, took to Twitter after DataCamp published the blog, urging students to boycott the very courses they designed. https://twitter.com/noamross/status/1116667602741485571 https://twitter.com/daniellequinn88/status/1117860833499832321 https://twitter.com/_tetration_/status/1118987968293875714 Instructors put financial pressures on the company by boycotting their own courses. They also wanted to get the executive responsible for such misbehaviour account for his actions, compensate the victim and compensate those who were fired for complaining—this may ultimately undercut DataCamp’s bottom line. Influential open-source communities, including RStudio, SatRdays, and R-Ladies, have cut all ties with DataCamp to show disappointment with the lack of serious accountability.. CEO steps down “indefinitely” from his role and accepts his mistakes Today Jonathan Cornelissen, accepted his mistake and wrote a public apology for his inappropriate behaviour. He writes, “I want to apologize to a former employee, our employees, and our community. I have failed you twice. First in my behavior and second in my failure to speak clearly and unequivocally to you in a timely manner. I am sorry.” He has also stepped down from his position as the company CEO indefinitely until there is complete review of company’s environment and culture. While it is in the right direction, unfortunately this apology comes to the community very late and is seen as a PR move to appease the backlash from the data science community and other instructors. https://twitter.com/mrsnoms/status/1121235830381645824 9 Data Science Myths Debunked 30 common data science terms explained Why is data science important?

0
0
25950

article-image-can-a-modified-mit-hippocratic-license-to-restrict-misuse-of-open-source-software-prompt-a-wave-of-ethical-innovation-in-tech

Savia Lobo

24 Sep 2019

5 min read

Can a modified MIT ‘Hippocratic License’ to restrict misuse of open source software prompt a wave of ethical innovation in tech?

Savia Lobo

24 Sep 2019

5 min read

Open source licenses allow software to be freely distributed, modified, and used. These licenses give developers an additional advantage of allowing others to use their software as per their own rules and conditions. Recently, software developer and open-source advocate Coraline Ada Ehmke has caused a stir in the software engineering community with ‘The Hippocratic License.’ Ehmke was also the original author of Contributor Covenant, a “code of conduct" for open source projects that encourages participants to use inclusive language and to refrain from personal attacks and harassment. In a tweet posted in September last year, following the code of conduct, she mentioned, “40,000 open source projects, including Linux, Rails, Golang, and everything OSS produced by Google, Microsoft, and Apple have adopted my code of conduct.” [box type="shadow" align="" class="" width=""]The term ‘Hippocratic’ is derived from the Hippocratic Oath, the most widely known of Greek medical texts. The Hippocratic Oath in literal terms requires a new physician to swear upon a number of healing gods that he will uphold a number of professional ethical standards.[/box] Ehmke explained the license in more detail in a post published on Sunday. In it, she highlights how the idea that writing software with the goals of clarity, conciseness, readability, performance, and elegance are limiting, and potentially dangerous.“All of these technologies are inherently political,” she writes. “There is no neutral political position in technology. You can’t build systems that can be weaponized against marginalized people and take no responsibility for them.”The concept of the Hippocratic license is relatively simple. In a tweet, Ehmke said that it “specifically prohibits the use of open-source software to harm others.” Open source software and the associated harm Out of the many privileges that open source software allows such as free redistribution of the software as well as the source code, the OSI also defines there is no discrimination against who uses it or where it will be put to use. A few days ago, a software engineer, Seth Vargo pulled his open-source software, Chef-Sugar, offline after finding out that Chef (a popular open source DevOps company using the software) had recently signed a contract selling $95,000-worth of licenses to the US Immigrations and Customs Enforcement (ICE), which has faced widespread condemnation for separating children from their parents at the U.S. border and other abuses. Vargo took down the Chef Sugar library from both GitHub and RubyGems, the main Ruby package repository, as a sign of protest. In May, this year, Mijente, an advocacy organization released documents stating that Palantir was responsible for the 2017 ICE operation that targeted and arrested family members of children crossing the border alone. Also, in May 2018, Amazon employees, in a letter to Jeff Bezos, protested against the sale of its facial recognition tech to Palantir where they “refuse to contribute to tools that violate human rights”, citing the mistreatment of refugees and immigrants by ICE. Also, in July, the WYNC revealed that Palantir’s mobile app FALCON was being used by ICE to carry out raids on immigrant communities as well as enable workplace raids in New York City in 2017. Founder of OSI responds to Ehmke’s Hippocratic License Bruce Perens, one of the founders of the Open Source movement in software, responded to Ehmke in a post titled “Sorry, Ms. Ehmke, The “Hippocratic License” Can’t Work” . “The software may not be used by individuals, corporations, governments, or other groups for systems or activities that actively and knowingly endanger harm, or otherwise threaten the physical, mental, economic, or general well-being of underprivileged individuals or groups,” he highlights in his post. “The terms are simply far more than could be enforced in a copyright license,” he further adds. “Nobody could enforce Ms. Ehmke’s license without harming someone, or at least threatening to do so. And it would be easy to make a case for that person being underprivileged,” he continued. He concluded saying that, though the terms mentioned in Ehmke’s license were unagreeable, he will “happily support Ms. Ehmke in pursuit of legal reforms meant to achieve the protection of underprivileged people.” Many have welcomed Ehmke's idea of an open source license with an ethical clause. However, the license is not OSI approved yet and chances are slim after Perens’ response. There are many users who do not agree with the license. Reaching a consensus will be hard. https://twitter.com/seannalexander/status/1175853429325008896 https://twitter.com/AdamFrisby/status/1175867432411336704 https://twitter.com/rishmishra/status/1175862512509685760 Even though developers host their source code on open source repositories, a license may bring certain level of restrictions on who is allowed to use the code. However, as Perens mentions, many of the terms in Ehmke’s license hard to implement. Irrespective of the outcome of this license’s approval process, Coraline Ehmke has widely opened up the topic of the need for long overdue FOSS licensing reforms in the open source community. It would be interesting to see if such a license would boost ethical reformation by giving more authority to the developers in imbibing their values and preventing the misuse of their software. Read the Hippocratic license to know more in detail. Other interesting news Tech ImageNet Roulette: New viral app trained using ImageNet exposes racial biases in artificial intelligent system Machine learning ethics: what you need to know and what you can do Facebook suspends tens of thousands of apps amid an ongoing investigation into how apps use personal data

0
0
25773

article-image-amazon-joins-nsf-funding-fairness-ai-public-outcry-big-tech-ethicswashing

Sugandha Lahoti

27 Mar 2019

5 min read

Amazon joins NSF in funding research exploring fairness in AI amidst public outcry over big tech #ethicswashing

Sugandha Lahoti

27 Mar 2019

5 min read

Behind the heels of Stanford’s HCAI Institute ( which, mind you, received public backlash for non-representative faculty makeup). Amazon is collaborating with the National Science Foundation (NSF) to develop systems based on fairness in AI. The company will be investing $10M each in artificial intelligence research grants over a three-year period. The official announcement was made by Prem Natarajan, VP of natural understanding in the Alexa AI group, who wrote in a blog post “With the increasing use of AI in everyday life, fairness in artificial intelligence is a topic of increasing importance across academia, government, and industry. Here at Amazon, the fairness of the machine learning systems we build to support our businesses is critical to establishing and maintaining our customers’ trust.” Per the blog post, Amazon will be collaborating with NSF to build trustworthy AI systems to address modern challenges. They will explore topics of transparency, explainability, accountability, potential adverse biases and effects, mitigation strategies, validation of fairness, and considerations of inclusivity. Proposals will be accepted from March 26 until May 10, to result in new open source tools, publicly available data sets, and publications. The two organizations plan to continue the program with calls for additional proposals in 2020 and 2021. There will be 6 to 9 awards of type Standard Grant or Continuing Grant. The award size will be $750,000 - up to a maximum of $1,250,000 for periods of up to 3 years. The anticipated funding amount is $7,600,000. “We are excited to announce this new collaboration with Amazon to fund research focused on fairness in AI,” said Jim Kurose, NSF's head for Computer and Information Science and Engineering. “This program will support research related to the development and implementation of trustworthy AI systems that incorporate transparency, fairness, and accountability into the design from the beginning.” The insidious nexus of private funding in public research: What does Amazon gain from collab with NSF? Amazon’s foray into fairness system looks more of a publicity stunt than eliminating AI bias. For starters, Amazon said that they will not be making the award determinations for this project. NSF would solely be awarding in accordance with its merit review process. However, Amazon said that Amazon researchers may be involved with the projects as an advisor only at the request of an awardee, or of NSF with the awardee's consent. As advisors, Amazon may host student interns who wish to gain further industry experience, which seems a bit dicey. Amazon will also not participate in the review process or receive proposal information. NSF will only be sharing with Amazon summary-level information that is necessary to evaluate the program, specifically the number of proposal submissions, number of submitting organizations, and numbers rated across various review categories. There was also the question of who exactly is funding since VII.B section of the proposal states: "Individual awards selected for joint funding by NSF and Amazon will be funded through separate NSF and Amazon funding instruments." https://twitter.com/nniiicc/status/1110335108634951680 https://twitter.com/nniiicc/status/1110335004989521920 Nic Weber, the author of the above tweets and Assistant Professor at UW iSchool, also raises another important question: “Why does Amazon get to put its logo on a national solicitation (for a paltry $7.6 million dollars in basic research) when it profits in the multi-billions off of AI that is demonstrably unfair and harmful.” Twitter was abundant with tweets from those in working tech questioning Amazon’s collaboration. https://twitter.com/mer__edith/status/1110560653872373760 https://twitter.com/patrickshafto/status/1110748217887649793 https://twitter.com/smunson/status/1110657292549029888 https://twitter.com/haldaume3/status/1110697325251448833 Amazon has already been under the fire due to its controversial decisions in the recent past. In June last year, when the US Immigration and Customs Enforcement agency (ICE) began separating migrant children from their parents, Amazon came under fire as one of the tech companies that aided ICE with the software required to do so. Amazon has also faced constant criticisms since the news came that Amazon had sold its facial recognition product Rekognition to a number of law enforcement agencies in the U.S. in the first half of 2018. Amazon is also under backlash after a study by the Massachusetts Institute of Technology in January, found Amazon Rekognition incapable of reliably determining the sex of female and darker-skinned faces in certain scenarios. Amazon is yet to fix this AI-bias anomaly, and yet it has now started a new collaboration with NSF that ironically focusses on building bias-free AI systems. Amazon’s Ring (a smart doorbell company) also came under public scrutiny in January, after it gave access to its employees to watch live footage from cameras of the customers. In other news, yesterday, Google also formed an external AI advisory council to help advance the responsible development of AI. More details here. Amazon won’t be opening its HQ2 in New York due to public protests Amazon admits that facial recognition technology needs to be regulated Amazon’s Ring gave access to its employees to watch live footage of the customers, The Intercept reports

0
0
24994

article-image-why-intel-is-betting-on-bfloat16-to-be-a-game-changer-for-deep-learning-training-hint-range-trumps-precision

Vincy Davis

22 Jul 2019

4 min read

Why Intel is betting on BFLOAT16 to be a game changer for deep learning training? Hint: Range trumps Precision.

Vincy Davis

22 Jul 2019

4 min read

A group of researchers from Intel Labs and Facebook have published a paper titled, “A Study of BFLOAT16 for Deep Learning Training”. The paper presents a comprehensive study indicating the success of Brain Floating Point (BFLOAT16) half-precision format in Deep Learning training across image classification, speech recognition, language modeling, generative networks and industrial recommendation systems. BFLOAT16 has a 7-bit mantissa and an 8-bit exponent, similar to FP32, but with less precision. BFLOAT16 was originally developed by Google and implemented in its third generation Tensor Processing Unit (TPU). https://twitter.com/JeffDean/status/1134524217762951168 Many state of the art training platforms use IEEE-754 or automatic mixed precision as their preferred numeric format for deep learning training. However, these formats lack in representing error gradients during back propagation. Thus, they are not able to satisfy the required performance gains. BFLOAT16 exhibits a dynamic range which can be used to represent error gradients during back propagation. This enables easier migration of deep learning workloads to BFLOAT16 hardware. Image Source: BFLOAT16 In the above table, all the values are represented as trimmed full precision floating point values with 8 bits of mantissa with their dynamic range comparable to FP32. By adopting to BFLOAT16 numeric format, the core compute primitives such as Fused Multiply Add (FMA) can be built using 8-bit multipliers. This leads to significant reduction in area and power while preserving the full dynamic range of FP32. How Deep neural network(DNNs) is trained with BFLOAT16? The below figure shows the mixed precision data flow used to train deep neural networks using BFLOAT16 numeric format. Image Source: BFLOAT16 The BFLOAT16 tensors are taken as input to the core compute kernels represented as General Matrix Multiply (GEMM) operations. It is then forwarded to the FP32 tensors as output. The researchers have developed a library called Quantlib, represented as Q in the figure, to implement the emulation in multiple deep learning frameworks. One of the functions of a Quantlib is to modify the elements of an input FP32 tensor to echo the behavior of BFLOAT16. Quantlib is also used to modify a copy of the FP32 weights to BFLOAT16 for the forward pass. The non-GEMM computations include batch-normalization and activation functions. The FP32 always maintains the bias tensors.The FP32 copy of the weights updates the step uses to maintain model accuracy. How does BFLOAT16 perform compared to FP32? Convolution Neural Networks Convolutional neural networks (CNN) are primarily used for computer vision applications such as image classification, object detection and semantic segmentation. AlexNet and ResNet-50 are used as the two representative models for the BFLOAT16 evaluation. AlexNet demonstrates that BFLOAT16 emulation follows very near to the actual FP32 run and achieves 57.2% top-1 and 80.1% top-5 accuracy. Whereas in ResNet-50, the BFLOAT16 emulation follows the FP32 baseline almost exactly and achieves the same top-1 and top-5 accuracy. Image Source: BFLOAT16 Similarly, the researchers were able to successfully demonstrate that BFLOAT16 is able to represent tensor values across many application domains including Recurrent Neural Networks, Generative Adversarial Networks (GANs) and Industrial Scale Recommendation System. The researchers thus established that the dynamic range of BFLOAT16 is of the same range as that of FP32 and its conversion to/from FP32 is also easy. It is important to maintain the same range as FP32 since no hyper-parameter tuning is required for convergence in FP32. A hyperparameter is a parameter of choosing a set of optimal hyperparameters in machine learning for a learning algorithm. Researchers of this paper expect to see an industry-wide adoption of BFLOAT16 across emerging domains. Recent reports suggest that Intel is planning to graft Google’s BFLOAT16 onto its processors as well as on its initial Nervana Neural Network Processor for training, the NNP-T 1000. Pradeep Dubey, who directs the Parallel Computing Lab at Intel and is also one of the researchers of this paper believes that for deep learning, the range of the processor is more important than the precision, which is the inverse of the rationale used for IEEE’s floating point formats. Users are finding it interesting that a BFLOAT16 half-precision format is suitable for deep learning applications. https://twitter.com/kevlindev/status/1152984689268781056 https://twitter.com/IAmMattGreen/status/1152769690621448192 For more details, head over to the “A Study of BFLOAT16 for Deep Learning Training” paper. Intel’s new brain inspired neuromorphic AI chip contains 8 million neurons, processes data 1K times faster Google plans to remove XSS Auditor used for detecting XSS vulnerabilities from its Chrome web browser IntelliJ IDEA 2019.2 Beta 2 released with new Services tool window and profiling tools

0
0
24618

article-image-postgresql-12-beta-1-released

Fatema Patrawala

24 May 2019

6 min read

PostgreSQL 12 Beta 1 released

Fatema Patrawala

24 May 2019

6 min read

The PostgreSQL Global Development Group announced yesterday its first beta release of PostgreSQL 12. It is now also available for download. This release contains previews of all features that will be available in the final release of PostgreSQL 12, though some details of the release could also change. PostgreSQL 12 feature highlights Indexing Performance, Functionality, and Management PostgreSQL 12 will improve the overall performance of the standard B-tree indexes with improvements to the space management of these indexes as well. These improvements also provide a reduction of index size for B-tree indexes that are frequently modified, in addition to a performance gain. Additionally, PostgreSQL 12 adds the ability to rebuild indexes concurrently, which lets you perform a REINDEX operation without blocking any writes to the index. This feature should help with lengthy index rebuilds that could cause downtime when managing a PostgreSQL database in a production environment. PostgreSQL 12 extends the abilities of several of the specialized indexing mechanisms. The ability to create covering indexes, i.e. the INCLUDE clause that was introduced in PostgreSQL 11, has now been added to GiST indexes. SP-GiST indexes now support the ability to perform K-nearest neighbor (K-NN) queries for data types that support the distance (<->) operation. The amount of write-ahead log (WAL) overhead generated when creating a GiST, GIN, or SP-GiST index is also significantly reduced in PostgreSQL 12, which provides several benefits to the disk utilization of a PostgreSQL cluster and features such as continuous archiving and streaming replication. Inlined WITH queries (Common table expressions) Common table expressions (or WITH queries) can now be automatically inlined in a query if they: a) are not recursive b) do not have any side-effects c) are only referenced once in a later part of a query This removes an "optimization fence" that has existed since the introduction of the WITH clause in PostgreSQL 8.4 Partitioning PostgreSQL 12 while processing tables with thousands of partitions for operations, it only needs to use a small number of partitions. This release also provides improvements to the performance of both INSERT and COPY into a partitioned table. ATTACH PARTITION can now be performed without blocking concurrent queries on the partitioned table. Additionally, the ability to use foreign keys to reference partitioned tables is now permitted in PostgreSQL 12. JSON path queries per SQL/JSON specification PostgreSQL 12 now allows execution of JSON path queries per the SQL/JSON specification in the SQL:2016 standard. Similar to XPath expressions for XML, JSON path expressions let you evaluate a variety of arithmetic expressions and functions in addition to comparing values within JSON documents. A subset of these expressions can be accelerated with GIN indexes, allowing the execution of highly performant lookups across sets of JSON data. Collations PostgreSQL 12 now supports case-insensitive and accent-insensitive comparisons for ICU provided collations, also known as "nondeterministic collations". When used, these collations can provide convenience for comparisons and sorts, but can also lead to a performance penalty as a collation may need to make additional checks on a string. Most-common Value Extended Statistics CREATE STATISTICS, introduced in PostgreSQL 12 to help collect more complex statistics over multiple columns to improve query planning, now supports most-common value statistics. This leads to improved query plans for distributions that are non-uniform. Generated Columns PostgreSQL 12 allows the creation of generated columns that compute their values with an expression using the contents of other columns. This feature provides stored generated columns, which are computed on inserts and updates and are saved on disk. Virtual generated columns, which are computed only when a column is read as part of a query, are not implemented yet. Pluggable Table Storage Interface PostgreSQL 12 introduces the pluggable table storage interface that allows for the creation and use of different methods for table storage. New access methods can be added to a PostgreSQL cluster using the CREATE ACCESS METHOD command and subsequently added to tables with the new USING clause on CREATE TABLE. A table storage interface can be defined by creating a new table access method. In PostgreSQL 12, the storage interface that is used by default is the heap access method, which is currently is the only built-in method. Page Checksums The pg_verify_checkums command has been renamed to pg_checksums and now supports the ability to enable and disable page checksums across a PostgreSQL cluster that is offline. Previously, page checksums could only be enabled during the initialization of a cluster with initdb. Authentication & Connection Security GSSAPI now supports client-side and server-side encryption and can be specified in the pg_hba.conf file using the hostgssenc and hostnogssencrecord types. PostgreSQL 12 also allows for discovery of LDAP servers based on DNS SRV records if PostgreSQL was compiled with OpenLDAP. Few noted behavior changes in PostgreSQL 12 There are several changes introduced in PostgreSQL 12 that can affect the behavior as well as management of your ongoing operations. A few of these are noted below; for other changes, visit the "Migrating to Version 12" section of the release notes. The recovery.conf configuration file is now merged into the main postgresql.conf file. PostgreSQL will not start if it detects thatrecovery.conf is present. To put PostgreSQL into a non-primary mode, you can use the recovery.signal and the standby.signal files. You can read more about archive recovery here: https://www.postgresql.org/docs/devel/runtime-config-wal.html#RUNTIME-CONFIG-WAL-ARCHIVE-RECOVERY Just-in-Time (JIT) compilation is now enabled by default. OIDs can no longer be added to user created tables using the WITH OIDs clause. Operations on tables that have columns that were created using WITH OIDS (i.e. columns named "OID") will need to be adjusted. Running a SELECT * command on a system table will now also output the OID for the rows in the system table as well, instead of the old behavior which required the OID column to be specified explicitly. Testing for Bugs & Compatibility The stability of each PostgreSQL release greatly depends on the community, to test the upcoming version with the workloads and testing tools in order to find bugs and regressions before the general availability of PostgreSQL 12. As this is a Beta, minor changes to database behaviors, feature details, and APIs are still possible. The PostgreSQL team encourages the community to test the new features of PostgreSQL 12 in their database systems to help eliminate any bugs or other issues that may exist. A list of open issues is publicly available in the PostgreSQL wiki. You can report bugs using this form on the PostgreSQL website: Beta Schedule This is the first beta release of version 12. The PostgreSQL Project will release additional betas as required for testing, followed by one or more release candidates, until the final release in late 2019. For further information please see the Beta Testing page. Many other new features and improvements have been added to PostgreSQL 12. Please see the Release Notes for a complete list of new and changed features. PostgreSQL 12 progress update Building a scalable PostgreSQL solution PostgreSQL security: a quick look at authentication best practices [Tutorial]

0
0
24274

article-image-amazon-remars-day-1-kicks-off-showcasing-amazons-next-gen-ai-robots-spot-the-robo-dog-and-a-guest-appearance-from-iron-man

Savia Lobo

06 Jun 2019

11 min read

Amazon re:MARS Day 1 kicks off showcasing Amazon’s next-gen AI robots; Spot, the robo-dog and a guest appearance from ‘Iron Man’

Savia Lobo

06 Jun 2019

11 min read

Amazon’s inaugural re:MARS event kicked off on Tuesday, June 4 at the Aria in Las Vegas. This 4-day event is inspired by MARS, a yearly invite-only event hosted by Jeff Bezos that brings together innovative minds in Machine learning, Automation, Robotics, and Space to share new ideas across these rapidly advancing domains. re:MARS featured a lot of announcements revealing a range of robots each engineered for a different purpose. Some of them include helicopter drones for delivery, two robot dogs by Boston Dynamics, Autonomous human-like acrobats by Walt Disney Imagineering, and much more. Amazon also revealed Alexa’s new Dialog Modeling for Natural, Cross-Skill Conversations. Let us have a brief look at each of the announcements. Robert Downey Jr. announces ‘The Footprint Coalition’ project to clean up the environment using Robotics Popularly known as the “Iron Man”, Robert Downey Jr.’s visit was one of the exciting moments where he announced a new project called The Footprint Coalition to clean up the planet using advanced technologies at re:MARS. “Between robotics and nanotechnology we could probably clean up the planet significantly, if not entirely, within a decade,” he said. According to The Forbes, “Amazon did not immediately respond to questions about whether it was investing financially or technologically in Downey Jr.’s project.” “At this point, the effort is severely light on details, with only a bare-bones website to accompany Downey’s public statement, but the actor said he plans to officially launch the project by April 2020,” Forbes reports. A recent United Nations report found that humans are having an unprecedented and devastating effect on global biodiversity, and researchers have found microplastics polluting the air, ocean, and soil. The announcement of this project has been opened to the public because the “company itself is under fire for its policies around the environment and climate change”. Additionally, Morgan Pope and Tony Dohi of Walt Disney Imagineering, also demonstrated their work to create autonomous acrobats. https://twitter.com/jillianiles/status/1136082571081555968 https://twitter.com/thesullivan/status/1136080570549563393 Amazon will soon deliver orders using drones On Wednesday, Amazon unveiled a revolutionary new drone that will test deliver toothpaste and other household goods starting within months. This drone is “part helicopter and part science-fiction aircraft” with built-in AI features and sensors that will help it fly robotically without threatening traditional aircraft or people on the ground. Gur Kimchi, vice president of Amazon Prime Air, said in an interview to Bloomberg, “We have a design that is amazing. It has performance that we think is just incredible. We think the autonomy system makes the aircraft independently safe.” However, he refused to provide details on where the delivery tests will be conducted. Also, the drones have received a year’s approval from the FAA to test the devices in limited ways that still won't allow deliveries. According to a Bloomberg report, “It can take years for traditional aircraft manufacturers to get U.S. Federal Aviation Administration approval for new designs and the agency is still developing regulations to allow drone flights over populated areas and to address national security concerns. The new drone presents even more challenges for regulators because there aren’t standards yet for its robotic features”. Competitors to Amazon’s unnamed drone include Alphabet Inc.’s Wing, which became the first drone to win an FAA approval to operate as a small airline, in April. Also, United Parcel Service Inc. and drone startup Matternet Inc. began using drones to move medical samples between hospitals in Raleigh, North Carolina, in March. Amazon’s drone is about six feet across with six propellers that lift it vertically off the ground. It is surrounded by a six-sided shroud that will protect people from the propellers, and also serves as a high-efficiency wing such that it can fly more horizontally like a plane. Once it gets off the ground, the craft tilts and flies sideways -- the helicopter blades becoming more like airplane propellers. Kimchi said, “Amazon’s business model for the device is to make deliveries within 7.5 miles (12 kilometers) from a company warehouse and to reach customers within 30 minutes. It can carry packages weighing as much as five pounds. More than 80% of packages sold by the retail behemoth are within that weight limit.” According to the company, one of the things the drone has mastered is detecting utility wires and clotheslines. They have been notoriously difficult to identify reliably and pose a hazard for a device attempting to make deliveries in urban and suburban areas. To know more about these high-tech drones in detail, head over to Amazon’s official blogpost. Boston Dynamics’ first commercial robot, Spot Boston Dynamics revealed its first commercial product, a quadrupedal robot named Spot. Boston Dynamics’ CEO Marc Raibert told The Verge, “Spot is currently being tested in a number of “proof-of-concept” environments, including package delivery and surveying work.” He also said that although there’s no firm launch date for the commercial version of Spot, it should be available within months, certainly before the end of the year. “We’re just doing some final tweaks to the design. We’ve been testing them relentlessly”, Raibert said. These Spot robots are capable of navigating environments autonomously, but only when their surroundings have been mapped in advance. They can withstand kicks and shoves and keep their balance on tricky terrain, but they don’t decide for themselves where to walk. These robots are simple to control; using a D-pad, users can steer the robot as just like an RC car or mechanical toy. A quick tap on the video feed streamed live from the robot’s front-facing camera allows to select a destination for it to walk to, and another tap lets the user assume control of a robot arm mounted on top of the chassis. With 3D cameras mounted atop, a Spot robot can map environments like construction sites, identifying hazards and work progress. It also has a robot arm which gives it greater flexibility and helps it open doors and manipulate objects. https://twitter.com/jjvincent/status/1136096290016595968 The commercial version will be “much less expensive than prototypes [and] we think they’ll be less expensive than other peoples’ quadrupeds”, Raibert said. Here’s a demo video of the Spot robot at the re:MARS event. https://youtu.be/xy_XrAxS3ro Alexa gets new dialog modeling for improved natural, cross-skill conversations Amazon unveiled new features in Alexa that would help the conversational agent to answer more complex questions and carry out more complex tasks. Rohit Prasad, Alexa vice president and head scientist, said, “We envision a world where customers will converse more naturally with Alexa: seamlessly transitioning between skills, asking questions, making choices, and speaking the same way they would with a friend, family member, or co-worker. Our objective is to shift the cognitive burden from the customer to Alexa.” This new update to Alexa is a set of AI modules that work together to generate responses to customers’ questions and requests. With every round of dialog, the system produces a vector — a fixed-length string of numbers — that represents the context and the semantic content of the conversation. “With this new approach, Alexa will predict a customer’s latent goal from the direction of the dialog and proactively enable the conversation flow across topics and skills,” Prasad says. “This is a big leap for conversational AI.” At re:MARS, Prasad also announced the developer preview of Alexa Conversations, a new deep learning-based approach for skill developers to create more-natural voice experiences with less effort, fewer lines of code, and less training data than before. The preview allows skill developers to create natural, flexible dialogs within a single skill; upcoming releases will allow developers to incorporate multiple skills into a single conversation. With Alexa Conversations, developers provide: (1) application programming interfaces, or APIs, that provide access to their skills’ functionality; (2) a list of entities that the APIs can take as inputs, such as restaurant names or movie times; (3) a handful of sample dialogs annotated to identify entities and actions and mapped to API calls. Alexa Conversations’ AI technology handles the rest. “It’s way easier to build a complex voice experience with Alexa Conversations due to its underlying deep-learning-based dialog modeling,” Prasad said. To know more about this announcement in detail, head over to Alexa’s official blogpost. Amazon Robotics unveiled two new robots at its fulfillment centers Brad Porter, vice president of robotics at Amazon, announced two new robots, one is, code-named Pegasus and the other one, Xanthus. Pegasus, which is built to sort packages, is a 3-foot-wide robot equipped with a conveyor belt on top to drop the right box in the right location. “We sort billions of packages a year. The challenge in package sortation is, how do you do it quickly and accurately? In a world of Prime one-day [delivery], accuracy is super-important. If you drop a package off a conveyor, lose track of it for a few hours — or worse, you mis-sort it to the wrong destination, or even worse, if you drop it and damage the package and the inventory inside — we can’t make that customer promise anymore”, Porter said. Porter said Pegasus robots have already driven a total of 2 million miles, and have reduced the number of wrongly sorted packages by 50 percent. Porter said the Xanthus, represents the latest incarnation of Amazon’s drive robot. Amazon uses tens of thousands of the current-generation robot, known as Hercules, in its fulfillment centers. Amazon unveiled Xanthus Sort Bot and Xanthus Tote Mover. “The Xanthus family of drives brings innovative design, enabling engineers to develop a portfolio of operational solutions, all of the same hardware base through the addition of new functional attachments. We believe that adding robotics and new technologies to our operations network will continue to improve the associate and customer experience,” Porter says. To know more about these new robots watch the video below: https://youtu.be/4MH7LSLK8Dk StyleSnap: An AI-powered shopping Amazon announced StyleSnap, a recent move to promote AI-powered shopping. StyleSnap helps users pick out clothes and accessories. All they need to do is upload a photo or screenshot of what they are looking for, when they are unable to describe what they want. https://twitter.com/amazonnews/status/1136340356964999168 Amazon said, "You are not a poet. You struggle to find the right words to explain the shape of a neckline, or the spacing of a polka dot pattern, and when you attempt your text-based search, the results are far from the trend you were after." To use StyleSnap, just open the Amazon app, click the camera icon in the upper right-hand corner, select the StyleSnap option, and then upload an image of the outfit. Post this, StyleSnap provides recommendations of similar outfits on Amazon to purchase, with users able to filter across brand, pricing, and reviews. Amazon's AI system can identify colors and edges, and then patterns like floral and denim. Using this information, its algorithm can then accurately pick a matching style. To know more about StyleSnap in detail, head over to Amazon’s official blog post. Amazon Go trains cashierless store algorithms using synthetic data Amazon at the re:MARS shared more details about Amazon Go, the company’s brand for its cashierless stores. They said Amazon Go uses synthetic data to intentionally introduce errors to its computer vision system. Challenges that had to be addressed before opening stores to avoid queues include the need to make vision systems that account for sunlight streaming into a store, little time for latency delays, and small amounts of data for certain tasks. Synthetic data is being used in a number of ways to power few-shot learning, improve AI systems that control robots, train AI agents to walk, or beat humans in games of Quake III. Dilip Kumar, VP of Amazon Go, said, “As our application improved in accuracy — and we have a very highly accurate application today — we had this interesting problem that there were very few negative examples, or errors, which we could use to train our machine learning models.” He further added, “So we created synthetic datasets for one of our challenging conditions, which allowed us to be able to boost the diversity of the data that we needed. But at the same time, we have to be careful that we weren’t introducing artifacts that were only visible in the synthetic data sets, [and] that the data translates well to real-world situations — a tricky balance.” To know more about this news in detail, check out this video: https://youtu.be/jthXoS51hHA The Amazon re:MARS event is still ongoing and will have many more updates. To catch live updates from Vegas visit Amazon’s blog. World’s first touch-transmitting telerobotic hand debuts at Amazon re:MARS tech showcase Amazon introduces S3 batch operations to process millions of S3 objects Amazon Managed Streaming for Apache Kafka (Amazon MSK) is now generally available

0
0
23940

article-image-microsoft-open-sources-infer-net-its-popular-model-based-machine-learning-framework

Melisha Dsouza

08 Oct 2018

3 min read

Microsoft open sources Infer.NET, it’s popular model-based machine learning framework

Melisha Dsouza

08 Oct 2018

3 min read

Last week, Microsoft open sourced Infer.NET, the cross-platform framework used for model-based machine learning. This popular machine learning engine used in Office, Xbox and Azure, will be available on GitHub under the permissive MIT license for free use in commercial applications. Features of Infer.NET The team at Microsoft Research in Cambridge initially envisioned Infer.NET as a research tool and released it for academic use in 2008. The framework has served as a base to publish hundreds of papers across a variety of fields, including information retrieval and healthcare. The team then started using the framework as a machine learning engine within a wide range of Microsoft products. A model-based approach to machine learning Infer.NET allows users to incorporate domain knowledge into their model. The framework can be used to build bespoke machine learning algorithms directly from their model. To sum it up, this framework actually constructs a learning algorithm for users based on the model they have provided. Facilitates interpretability Infer.NET also facilitates interpretability. If users have designed the model themselves and the learning algorithm follows that model, they can understand why the system behaves in a particular way or makes certain predictions. Probabilistic Approach In Infer.NET, models are described using a probabilistic program. This is used to describe real-world processes in a language that machines understand. Infer.NET compiles the probabilistic program into high-performance code for implementing something cryptically called deterministic approximate Bayesian inference. This approach allows a notable amount of scalability. For instance, it can be used in a system that automatically extracts knowledge from billions of web pages, comprising petabytes of data. Additional Features The framework also supports the ability of the system to learn as new data arrives. The team is also working towards developing and growing it further. Infer.NET will become a part of ML.NET (the machine learning framework for .NET developers). They have already set up the repository under the .NET Foundation and moved the package and namespaces to Microsoft.ML.Probabilistic. Being cross platform, Infer.NET supports .NET Framework 4.6.1, .NET Core 2.0, and Mono 5.0. Windows users get to use Visual Studio 2017, while macOS and Linux folks have command-line options, which could be incorporated into the code wrangler of their choice. Download the framework to learn more about Infer.NET. You can also check the documentation for a detailed User Guide. To know more about this news, head over to Microsoft’s official blog. Microsoft announces new Surface devices to enhance user productivity, with style and elegance Neural Network Intelligence: Microsoft’s open source automated machine learning toolkit Microsoft’s new neural text-to-speech service lets machines speak like people

0
0
23092

article-image-julia-computing-research-team-runs-machine-learning-model-on-encrypted-data-without-decrypting-it

Fatema Patrawala

28 Nov 2019

5 min read

Julia Computing research team runs machine learning model on encrypted data without decrypting it

Fatema Patrawala

28 Nov 2019

5 min read

Last week, the team at Julia Computing published a research based on cutting edge cryptographic techniques. The research involved cryptography techniques to practically perform computation on data without ever decrypting it. For example, the user would send encrypted data (e.g. images) to the cloud API, which would run the machine learning model and then return the encrypted answer. Nowhere is the user data decrypted and in particular the cloud provider does not have access to either the original image nor is it able to decrypt the prediction it computed. The team made this possible by building a machine learning service for handwriting recognition of encrypted images (from the MNIST dataset). The ability to compute on encrypted data is generally referred to as “secure computation” and is a fairly large area of research, with many different cryptographic approaches and techniques for a plethora of different application scenarios. For their research, Julia team focused on using a technique known as “homomorphic encryption”. What is homomorphic encryption Homomorphic encryption is a form of encryption that allows computation on ciphertexts, generating an encrypted result which, when decrypted, matches the result of the operations as if they had been performed on the plaintext. This technique can be used for privacy-preserving outsourced storage and computation. It allows data to be encrypted and out-sourced to commercial cloud environments for processing, all while encrypted. In highly regulated industries, such as health care, homomorphic encryption can be used to enable new services by removing privacy barriers inhibiting data sharing. In this research, the Julia Computing team used a homomorphic encryption system which involves the following operations: pub_key, eval_key, priv_key = keygen() encrypted = encrypt(pub_key, plaintext) decrypted = decrypt(priv_key, encrypted) encrypted′ = eval(eval_key, f, encrypted) So the first three are fairly straightforward and are familiar to anyone who has used asymmetric cryptography before. The last one is important as it evaluates some function f on the encryption and returns another encrypted value corresponding to the result of evaluating f on the encrypted value. It is this property that gives homomorphic computation its name. Further the Julia Computing team talks about CKKS (Cheon-Kim-Kim-Song), a homomorphic encryption scheme that allowed homomorphic evaluation on the following primitive operations: Element-wise addition of length n vectors of complex numbers Element-wise multiplication of length n complex vectors Rotation (in the circshift sense) of elements in the vector Complex conjugation of vector elements But they also mentioned that computations using CKKS were noisy, and hence they tested to perform these operations in Julia. Which convolutional neural network did the Julia Computing team use As a starting point the Julia Computing team used the convolutional neural network example given in the Flux model zoo. They kept training the loop, prepared the data and tweaked the ML model slightly. It is essentially the same model as the one used in the paper “Secure Outsourced Matrix Computation and Application to Neural Networks”, which uses the same (CKKS) cryptographic scheme. This paper also encrypts the model, which the Julia team neglected for simplicity and they involved bias vectors after every layer (which Flux does by default). This resulted in a higher test set accuracy of the model used by Julia team which was (98.6% vs 98.1%). An unusual feature in this model are the x.^2 activation functions. More common choices here would have been tanh or relu or something more advanced. While those functions (relu in particular) are cheap to evaluate on plaintext values, they would however, be quite expensive to evaluate on encrypted values. Also, the team would have ended up evaluating a polynomial approximation had they adopted these common choices. Fortunately x.^2 worked fine for their purpose. How was the homomorphic operation carried out The team performed homomorphic operation on Convolutions and Matrix Multiply assuming a batch size of 64. They precomputed each convolution window of 7x7 extraction from the original images which gave them 64 7x7 matrices per input image. Then they collected the same position in each window into one vector and got a 64-element vector for each image, (i.e. a total of 49 64x64 matrices), and encrypted these matrices. In this way the convolution became a scalar multiplication of the whole matrix with the appropriate mask element, and by summing all 49 elements later, the team got the result of the convolution. Then the team moved to Matrix Multiply by rotating elements in the vector to effect a re-ordering of the multiplication indices. They considered a row-major ordering of matrix elements in the vector. Then shifted the vector by a multiple of the row-size, and got the effect of rotating the columns, which is a sufficient primitive for implementing matrix multiply. The team was able to get everything together and it worked. You can take a look at the official blog post to know the step by step implementation process with codes. Further they also executed the whole encryption process in Julia as it allows powerful abstractions and they could encapsulate the whole convolution extraction process as a custom array type. The Julia Computing team states, “Achieving the dream of automatically executing arbitrary computations securely is a tall order for any system, but Julia’s metaprogramming capabilities and friendly syntax make it well suited as a development platform.” Julia co-creator, Jeff Bezanson, on what’s wrong with Julialang and how to tackle issues like modularity and extension Julia v1.3 released with new multithreading features, and much more! The Julia team shares its finalized release process with the community Julia announces the preview of multi-threaded task parallelism in alpha release v1.3.0 How to make machine learning based recommendations using Julia [Tutorial]

0
0
22366

article-image-speech2face-a-neural-network-that-imagines-faces-from-hearing-voices-is-it-too-soon-to-worry-about-ethnic-profiling

Savia Lobo

28 May 2019

8 min read

Speech2Face: A neural network that “imagines” faces from hearing voices. Is it too soon to worry about ethnic profiling?

Savia Lobo

28 May 2019

8 min read

Last week, a few researchers from the MIT CSAIL and Google AI published their research study of reconstructing a facial image of a person from a short audio recording of that person speaking, in their paper titled, “Speech2Face: Learning the Face Behind a Voice”. The researchers designed and trained a neural network which uses millions of natural Internet/YouTube videos of people speaking. During training, they demonstrated that the model learns voice-face correlations that allows it to produce images that capture various physical attributes of the speakers such as age, gender, and ethnicity. The entire training was done in a self-supervised manner, by utilizing the natural co-occurrence of faces and speech in Internet videos, without the need to model attributes explicitly. They said they further evaluated and numerically quantified how their Speech2Face reconstructs, obtains results directly from audio, and how it resembles the true face images of the speakers. For this, they tested their model both qualitatively and quantitatively on the AVSpeech dataset and the VoxCeleb dataset. The Speech2Face model The researchers utilized the VGG-Face model, a face recognition model pre-trained on a large-scale face dataset called DeepFace and extracted a 4096-D face feature from the penultimate layer (fc7) of the network. These face features were shown to contain enough information to reconstruct the corresponding face images while being robust to many of the aforementioned variations. The Speech2Face pipeline consists of two main components: 1) a voice encoder, which takes a complex spectrogram of speech as input, and predicts a low-dimensional face feature that would correspond to the associated face; and 2) a face decoder, which takes as input the face feature and produces an image of the face in a canonical form (frontal-facing and with neutral expression). During training, the face decoder is fixed, and only the voice encoder is trained which further predicts the face feature. How were the facial features evaluated? To quantify how well different facial attributes are being captured in Speech2Face reconstructions, the researchers tested different aspects of the model. Demographic attributes Researchers used Face++, a leading commercial service for computing facial attributes. They evaluated and compared age, gender, and ethnicity, by running the Face++ classifiers on the original images and our Speech2Face reconstructions. The Face++ classifiers return either “male” or “female” for gender, a continuous number for age, and one of the four values, “Asian”, “black”, “India”, or “white”, for ethnicity. Source: Arxiv.org Craniofacial attributes Source: Arxiv.org The researchers evaluated craniofacial measurements commonly used in the literature, for capturing ratios and distances in the face. They computed the correlation between F2F and the corresponding S2F reconstructions. Face landmarks were computed using the DEST library. As can be seen, there is statistically significant (i.e., p < 0.001) positive correlation for several measurements. In particular, the highest correlation is measured for the nasal index (0.38) and nose width (0.35), the features indicative of nose structures that may affect a speaker’s voice. Feature similarity The researchers further test how well a person can be recognized from on the face features predicted from speech. They, first directly measured the cosine distance between the predicted features and the true ones obtained from the original face image of the speaker. The table above shows the average error over 5,000 test images, for the predictions using 3s and 6s audio segments. The use of longer audio clips exhibits consistent improvement in all error metrics; this further evidences the qualitative improvement observed in the image below. They further evaluated how accurately they could retrieve the true speaker from a database of face images. To do so, they took the speech of a person to predict the feature using the Speech2Face model and query it by computing its distances to the face features of all face images in the database. Ethical considerations with Speech2Face model Researchers said that the training data used is a collection of educational videos from YouTube and that it does not represent equally the entire world population. Hence, the model may be affected by the uneven distribution of data. They have also highlighted that “ if a certain language does not appear in the training data, our reconstructions will not capture well the facial attributes that may be correlated with that language”. “In our experimental section, we mention inferred demographic categories such as “White” and “Asian”. These are categories defined and used by a commercial face attribute classifier and were only used for evaluation in this paper. Our model is not supplied with and does not make use of this information at any stage”, the paper mentions. They also warn that any further investigation or practical use of this technology would be carefully tested to ensure that the training data is representative of the intended user population. “If that is not the case, more representative data should be broadly collected”, the researchers state. Limitations of the Speech2Face model In order to test the stability of the Speech2Face reconstruction, the researchers used faces from different speech segments of the same person, taken from different parts within the same video, and from a different video. The reconstructed face images were consistent within and between the videos. They further probed the model with an Asian male example speaking the same sentence in English and Chinese to qualitatively test the effect of language and accent. While having the same reconstructed face in both cases would be ideal, the model inferred different faces based on the spoken language. In other examples, the model was able to successfully factor out the language, reconstructing a face with Asian features even though the girl was speaking in English with no apparent accent. “In general, we observed mixed behaviors and a more thorough examination is needed to determine to which extent the model relies on language. More generally, the ability to capture the latent attributes from speech, such as age, gender, and ethnicity, depends on several factors such as accent, spoken language, or voice pitch. Clearly, in some cases, these vocal attributes would not match the person’s appearance”, the researchers state in the paper. Speech2Cartoon: Converting generated image into cartoon faces The face images reconstructed from speech may also be used for generating personalized cartoons of speakers from their voices. The researchers have used Gboard, the keyboard app available on Android phones, which is also capable of analyzing a selfie image to produce a cartoon-like version of the face. Such cartoon re-rendering of the face may be useful as a visual representation of a person during a phone or a video conferencing call when the person’s identity is unknown or the person prefers not to share his/her picture. The reconstructed faces may also be used directly, to assign faces to machine-generated voices used in home devices and virtual assistants. https://twitter.com/NirantK/status/1132880233017761792 A user on HackerNews commented, “This paper is a neat idea, and the results are interesting, but not in the way I'd expected. I had hoped it would the domain of how much person-specific information this can deduce from a voice, e.g. lip aperture, overbite, size of the vocal tract, openness of the nares. This is interesting from a speech perception standpoint. Instead, it's interesting more in the domain of how much social information it can deduce from a voice. This appears to be a relatively efficient classifier for gender, race, and age, taking voice as input.” “I'm sure this isn't the first time it's been done, but it's pretty neat to see it in action, and it's a worthwhile reminder: If a neural net is this good at inferring social, racial, and gender information from audio, humans are even better. And the idea of speech as a social construct becomes even more relevant”, he further added. This recent study is interesting considering the fact that it is taking AI to another level wherein we are able to predict the face just by using audio recordings and even without the need for a DNA. However, there can be certain repercussions, especially when it comes to security. One can easily misuse such technology by impersonating someone else and can cause trouble. It would be interesting to see how this study turns out to be in the near future. To more about the Speech2Face model in detail, head over to the research paper. OpenAI introduces MuseNet: A deep neural network for generating musical compositions An unsupervised deep neural network cracks 250 million protein sequences to reveal biological structures and functions OpenAI researchers have developed Sparse Transformers, a neural network which can predict what comes next in a sequence

0
0
22010

article-image-production-ready-pytorch-1-0-preview-release-is-here-with-torch-jit-c10d-distributed-library-c-api

Aarthi Kumaraswamy

02 Oct 2018

4 min read

PyTorch 1.0 preview release is production ready with torch.jit, c10d distributed library, C++ API

Aarthi Kumaraswamy

02 Oct 2018

4 min read

Back in May, the PyTorch team shared their roadmap for PyTorch 1.0 release and highlighted that this most anticipated version will not only continue to provide stability and simplicity of use to its users, but will also make it production ready while making it a hassle-free migration experience for its users. Today, Facebook announced the release of PyTorch 1.0 RC1. The official announcement states, “PyTorch 1.0 accelerates the workflow involved in taking breakthrough research in artificial intelligence to production deployment. With deeper cloud service support from Amazon, Google, and Microsoft, and tighter integration with technology providers ARM, Intel, IBM, NVIDIA, and Qualcomm, developers can more easily take advantage of PyTorch’s ecosystem of compatible software, hardware, and developer tools. The more software and hardware that is compatible with PyTorch 1.0, the easier it will be for AI developers to quickly build, train, and deploy state-of-the-art deep learning models.” PyTorch is an open-source Python-based deep learning framework which provides powerful GPU acceleration. PyTorch is known for advanced indexing and functions, imperative style, integration support and API simplicity. This is one of the key reasons why developers prefer PyTorch for research and hackability. On the downside, it has struggled with adoption in production environments. The pyTorch team acknowledged this in their roadmap and have worked on improving this aspect significantly in pyTorch 1.0 not just in terms of improving the library but also by enriching its ecosystems but partnering with key software and hardware vendors. “One of its biggest downsides has been production-support. What we mean by production-support is the countless things one has to do to models to run them efficiently at massive scale: exporting to C++-only runtimes for use in larger projects optimizing mobile systems on iPhone, Android, Qualcomm and other systems using more efficient data layouts and performing kernel fusion to do faster inference (saving 10% of speed or memory at scale is a big win) quantized inference (such as 8-bit inference)”, stated the pyTorch team in their roadmap post. Below are some key highlights of this major milestone for PyTorch. JIT The JIT is a set of compiler tools for bridging the gap between research in PyTorch and production. It includes a language called Torch Script ( a subset of Python), and two ways (Tracing mode and Script mode) in which the existing code can be made compatible with the JIT. Torch Script code can be aggressively optimized and it can be serialized for later use in the new C++ API, which doesn't depend on Python at all. torch.distributed new "C10D" library The torch.distributed package and torch.nn.parallel.DistributedDataParallel module are backed by the new "C10D" library. The main highlights of the new library are: C10D is performance driven and operates entirely asynchronously for all backends: Gloo, NCCL, and MPI. Significant Distributed Data Parallel performance improvements especially for slower network like ethernet-based hosts Adds async support for all distributed collective operations in the torch.distributed package. Adds send and recv support in the Gloo backend C++ Frontend [API Unstable] The C++ frontend is a pure C++ interface to the PyTorch backend that follows the API and architecture of the established Python frontend. It is intended to enable research in high performance, low latency and bare metal C++ applications. It provides equivalents to torch.nn, torch.optim, torch.data and other components of the Python frontend. The C++ frontend is marked as "API Unstable" as part of PyTorch 1.0. This means it is ready to be used for building research applications, but still has some open construction sites that will stabilize over the next month or two. In other words, it is not ready for use in production, yet. N-dimensional empty tensors, a collection of new operators inspired from numpy and scipy and new distributions such as Weibull, Negative binomial and multivariate log gamma distributions have been introduced. There have also been a lot of breaking changes, bug fixes, and other improvements made to pyTorch 1.0. For more details read the official announcement and also the official release notes for pyTorch. What is PyTorch and how does it work? Build your first neural network with PyTorch [Tutorial] Is Facebook-backed PyTorch better than Google’s TensorFlow? Can a production-ready Pytorch 1.0 give TensorFlow a tough time?

0
0
20923

article-image-austrian-supreme-court-rejects-facebooks-bid-to-stop-a-gdpr-violation-lawsuit-against-it-by-privacy-activist-max-schrems

Bhagyashree R

13 Jun 2019

5 min read

Austrian Supreme Court rejects Facebook’s bid to stop a GDPR-violation lawsuit against it by privacy activist, Max Schrems

Bhagyashree R

13 Jun 2019

5 min read

On Tuesday, the Austrian Supreme Court overturned Facebook’s appeal to block a lawsuit against it for not conforming to Europe’s General Data Protection Regulation (GDPR). This decision will also have an effect on other EU member states that give “special status to industry sectors.” https://twitter.com/maxschrems/status/1138703007594496000?s=19 The lawsuit was filed by Austrian lawyer and data privacy activist, Max Schrems. In the lawsuit, he has accused Facebook of using illegal privacy policies as it forces users to give their consent for processing their data in return for using the service. GDPR does not allow forced consent as a valid legal basis for processing user data. Schrems said in a statement, “Facebook has even blocked accounts of users who have not given consent. In the end users only had the choice to delete the account or hit the ‘agree’ button–that’s not a free choice; it more reminds of a North Korean election process. Many users do not know yet that this annoying way of pushing people to consent is actually forbidden under GDPR in most cases.” Facebook has been trying to block this lawsuit by questioning whether GDPR-based cases fall under the jurisdiction of courts. According to Facebook’s appeal, these lawsuits should be handled by data protection authorities, Irish Data Protection Commissioner (DPC) in this case. Dismissing Facebook’s argument, this landmark decision says that any complaints made under Article 79 of GDPR can be reviewed both by judges and data protection authorities. This verdict comes as a sigh of relief for Schrems, who has to wait for almost 5 years to even get this lawsuit to trial because of Facebook's continuous blockade attempts. “I am very pleased that we were able to clarify this fundamental issue. We are hoping for a speedy procedure now that the case has been pending for a good 5 years," Schrems said in a press release. He further added, “If we win even part of the case, Facebook would have to adapt its business model considerably. We are very confident that we will succeed on the substance too now. Of course, they wanted to prevent such a case by all means and blocked it for five years.“ Previously, the Vienna Regional Court did give the verdict in Facebook’s favor declaring that it did not have jurisdiction and Facebook could only be sued in Ireland, where its European headquarters are. Schrems believes that this verdict was given because there is “a tendency that civil judges are not keen to have (complex) GDPR cases on their table.” Now, both the Appellate Court and the Austrian Supreme Court have agreed that everyone can file a lawsuit for GDPR violations. Schrems original idea was to make a “class action” style suit against Facebook by allowing any Facebook user to join the case. But, the court did not allow that, and Schemes' was limited to bring only a model case to the court. This is Schrems’ second victory this year in the fight against Facebook. Last month, the Irish Supreme court dismissed Facebook from stopping the referral of privacy case regarding the transfer of EU citizens’ data to the United States. The hearing of this case is now scheduled to happen at the European Court of Justice (ECJ) in July. Schrems’ eight-year-long battle against Facebook Schrems’ fight against Facebook started way before we all realized the severity of tech companies harvesting our personal data. Back in 2011, Shcrems’ professor at Santa Clara University invited Facebook’s privacy lawyer Ed Palmieri to speak to his class. Schrems was surprised to see the lawyer's lack of awareness regarding data protection laws in Europe. He then decided to write his thesis paper about Facebook’s misunderstanding of EU privacy laws. As a part of the research, he requested his personal data from Facebook and found it had his entire user history. He went on to make 22 complaints to the Irish Data Protection Commission, in which he accused Facebook of breaking European data protection laws. His efforts finally showed results, when in 2015 the European Court of Justice took down the EU–US Safe Harbor Principles. As a part of his fight for global privacy rights, Schrems also co-founded the European non-profit noyb (None of Your Business), which aims to "make privacy real”. The organization aims to introduce ways to execute privacy enforcement more effectively. It holds companies accountable who fail to follow Europe's privacy laws and also takes media initiatives to support GDPR. Looks like things hasn’t been going well for Facebook. Along with losing these cases in the EU, in a revelation yesterday by the WSJ, several emails were found that indicate Mark Zuckerberg’s knowledge of potentially problematic privacy practices at the company. You can read the entire press release on NOYB’s official website. Facebook releases Pythia, a deep learning framework for vision and language multimodal research Zuckberg just became the target of the world’s first high profile white hat deepfake op. Can Facebook come out unscathed? US regulators plan to probe Google on anti-trust issues; Facebook, Amazon & Apple also under legal scrutiny

0
0
20613

article-image-epics-public-voice-coalition-announces-universal-guidelines-for-artificial-intelligence-ugai-at-icdppc-2018

Natasha Mathur

23 Oct 2018

5 min read

EPIC’s Public Voice Coalition announces Universal Guidelines for Artificial Intelligence (UGAI) at ICDPPC 2018

Natasha Mathur

23 Oct 2018

5 min read

The Public Voice Coalition, an organization that promotes public participation in decisions regarding the future of the Internet, came out with guidelines for AI, namely, Universal Guidelines on Artificial Intelligence (UGAI), today. The UGAI were announced at the currently ongoing, 40th International Data Protection and Privacy Commissioners Conference (ICDPPC), in Brussels, Belgium, today. The ICDPPC is a worldwide forum where independent regulators from around the world come together to explore high-level recommendations regarding privacy, freedom, and protection of data. These recommendations are addressed to governments and international organizations. The 40th ICDPPC has speakers such as Tim Berners Lee (director of the world wide web), Tim Cook (Apple Inc, CEO), Giovanni Butarelli (European Data Protection Supervisor), and Jagdish Singh Khehar (44th Chief Justice of India) among others attending the conference. The UGAI combines the elements of human rights doctrine, data protection law, as well as ethical guidelines. “We propose these Universal Guidelines to inform and improve the design and use of AI. The Guidelines are intended to maximize the benefits of AI, to minimize the risk, and to ensure the protection of human rights. These guidelines should be incorporated into ethical standards, adopted in national law and international agreements, and built into the design of systems”, reads the announcement page. The UGAI comprises twelve different principles for AI governance that haven’t been previously covered in similar policy frameworks. Let’s have a look at these principles in UGAI. Transparency principle Transparency principle puts emphasis on an individual’s right to interpret the basis of a particular AI decision concerning them. This means all individuals involved in a particular AI project should have access to the factors, the logic, and techniques that produced the outcome. Right to human determination The Right to human determination focuses on the fact that individuals and not machines should be responsible when it comes to automated decision-making. For instance, during the operation of an autonomous vehicle, it is impractical to include a human decision before the machine makes an automated decision. However, if an automated system fails, then this principle should be applied and human assessment of the outcome should be made to ensure accountability. Identification Obligation This principle establishes the foundation of AI accountability and makes the identity of an AI system and the institution responsible quite clear. This is because an AI system usually knows a lot about an individual. But, the individual might now even be aware of the operator of the AI system. Fairness Obligation The Fairness Obligation puts an emphasis on how the assessment of the objective outcomes of the AI system is not sufficient to evaluate an AI system. It is important for the institutions to ensure that AI systems do not reflect unfair bias or make any discriminatory decisions. Assessment and accountability Obligation This principle focuses on assessing an AI system based on factors such as its benefits, purpose, objectives, and the risks involved before and during its deployment. An AI system should be deployed only after this evaluation is complete. In case the assessment reveals substantial risks concerning Public Safety and Cybersecurity, then the AI system should not be deployed. This, in turn, ensures accountability. Accuracy, Reliability, and Validity Obligations This principle focuses on setting out the key responsibilities related to the outcome of automated decisions by an AI system. Institutions must ensure the accuracy, reliability, and validity of decisions made by their AI system. Data Quality Principle This puts an emphasis on the need for institutions to establish data provenance. It also includes assuring the quality and relevance of the data that is fed into the AI algorithms. Public Safety Obligation This principle ensures that institutions assess the public safety risks arising from AI systems that control different devices in the physical world. These institutions must implement the necessary safety controls within such AI systems. Cybersecurity Obligation This principle is a follow up to the Public Safety Obligation and ensures that institutions developing and deploying these AI systems take cybersecurity threats into account. Prohibition on Secret Profiling This principle states that no institution shall establish a secret profiling system. This is to ensure the possibility of independent accountability. Prohibition on Unitary Scoring This principle states that no national government shall maintain a general-purpose score on its citizens or residents. “A unitary score reflects not only a unitary profile but also a predetermined outcome across multiple domains of human activity,” reads the guideline page. Termination Obligation Termination Obligation states that an institution has an affirmative obligation to terminate the AI system built if human control of that system is no longer possible. For more information, check out the official UGAI documentation. The ethical dilemmas developers working on Artificial Intelligence products must consider Sex robots, artificial intelligence, and ethics: How desire shapes and is shaped by algorithms Introducing Deon, a tool for data scientists to add an ethics checklist

0
0
20490

article-image-deepcube-a-new-deep-reinforcement-learning-approach-solves-the-rubiks-cube-with-no-human-help

Savia Lobo

29 Jul 2018

4 min read

DeepCube: A new deep reinforcement learning approach solves the Rubik’s cube with no human help

Savia Lobo

29 Jul 2018

4 min read

Humans have been excellent players in most of the gameplays be it indoor or outdoors. However, over the recent years we have been increasingly coming across machines that are playing and winning popular board games Go and Chess against humans using machine learning algorithms. If you think machines are only good at solving the black and whites, you are wrong. The recent achievement of a machine trying to solve a complex game (a Rubik’s cube) is DeepCube. Rubik cube is a challenging piece of puzzle that’s captivated everyone since childhood. Solving it is a brag-worthy accomplishment for most adults. A group of UC Irvine researchers have now developed a new algorithm (used by DeepCube) known as Autodidactic Iteration, which can solve a Rubik’s cube with no human assistance. The Erno Rubik’s cube conundrum Rubik’s cube, a popular three-dimensional puzzle was developed by Erno Rubik in the year 1974. Rubik worked for a month to figure out the first algorithm to solve the cube. Researchers at the UC Irvine state that “Since then, the Rubik’s Cube has gained worldwide popularity and many human-oriented algorithms for solving it have been discovered. These algorithms are simple to memorize and teach humans how to solve the cube in a structured, step-by-step manner.” After the cube became popular among mathematicians and computer scientists, questions around how to solve the cube with least possible turns became mainstream. In 2014, it was proved that the least number of steps to solve the cube puzzle was 26. More recently, computer scientists have tried to find ways for machines to solve the Rubik’s cube. As a first step, they tried and tested ways to use the same successful approach tried in the games Go and Chess. However, this approach did not work well for the Rubik’s cube. The approach: Rubik vs Chess and Go Algorithms used in Go and Chess are fed with rules of the game and then they play against themselves. The deep learning machine here is rewarded based on its performance at every step it takes. Reward process is considered as important as it helps the machine to distinguish between a good and a bad move. Following this, the machine starts playing well i.e it learns how to play well. On the other hand, the rewards in the case of Rubik’s cube are nearly hard to determine. This is because there are random turns in the cube and it is hard to judge whether the new configuration is any closer to a solution. The random turns can be unlimited and hence earning an end-state reward is very rare. Both Chess and Go have a large search space but each move can be evaluated and rewarded accordingly. This isn’t the case for Rubik’s cube! UC Irvine researchers have found a way for machines to create its own set of rewards in the Autodidactic Iteration method for DeepCube. Autodidactic Iteration: Solving the Rubik’s Cube without human Knowledge DeepCube’s Autodidactic Iteration (ADI) is a form of deep learning known as deep reinforcement learning (DRL). It combines classic reinforcement learning, deep learning, and Monte Carlo Tree Search (MCTS). When DeepCube gets an unsolved cube, it decides whether the specific move is an improvement on the existing configuration. To do this, it must be able to evaluate the move. The algorithm, Autodidactic iteration starts with the finished cube and works backwards to find a configuration that is similar to the proposed move. Although this process is imperfect, deep learning helps the system figure out which moves are generally better than others. Researchers trained a network using ADI for 2,000,000 iterations. They further reported, “The network witnessed approximately 8 billion cubes, including repeats, and it trained for a period of 44 hours. Our training machine was a 32-core Intel Xeon E5-2620 server with three NVIDIA Titan XP GPUs.” After training, the network uses a standard search tree to hunt for suggested moves for each configuration. The researchers in their paper said, “Our algorithm is able to solve 100% of randomly scrambled cubes while achieving a median solve length of 30 moves — less than or equal to solvers that employ human domain knowledge.” Researchers also wrote, “DeepCube is able to teach itself how to reason in order to solve a complex environment with only one reward state using pure reinforcement learning.” Furthermore, this approach will have a potential to provide approximate solutions to a broad class of combinatorial optimization problems. To explore Deep Reinforcement Learning check out our latest releases, Hands-On Reinforcement Learning with Python and Deep Reinforcement Learning Hands-On. How greedy algorithms work Creating a reference generator for a job portal using Breadth First Search (BFS) algorithm Anatomy of an automated machine learning algorithm (AutoML)

0
0
20484

article-image-jupyterhub-1-0-releases-with-named-servers-support-for-tls-encryption-and-more

Sugandha Lahoti

06 May 2019

4 min read

JupyterHub 1.0 releases with named servers, support for TLS encryption and more

Sugandha Lahoti

06 May 2019

4 min read

JupyterHub 1.0 was released last week as the first major update since 2015. JupyterHub allows multiple users to use Jupyter notebook. JupyterHub 1.0 comes with UI support for managing named servers, and TLS encryption and authentication support, among others. What’s new in JupyterHub 1.0? UI for named servers JupyterHub 1.0 comes with full UI support for managing named servers. Named servers allow each Jupyterhub user to have access to more than one named server. JupyterHub 1.0 introduces a new UI for managing these servers. Users can now create/start/stop/delete their servers from the hub home page. Source: Jupyter blog TLS encryption and authentication JupyterHub 1.0 supports TLS encryption and authentication of all internal communication. Spawners must implement .move_certs method to make certificates available to the notebook server if it is not local to the Hub. Currently, local spawners and DockerSpawner support internal ssl. Checking and refreshing authentication JupyterHub. 1.0 introduces three new configurations to refresh or expire authentication information. c.Authenticator.auth_refresh_age allows authentication to expire after a number of seconds. c.Authenticator.refresh_pre_spawn forces a refresh of authentication prior to spawning a server, effectively requiring a user to have up-to-date authentication when they start their server. Authenticator.refresh_auth defines what it means to refresh authentication and can be customized by Authenticator implementations. Other changes A new API is added in JupyterHub 1.0 for registering user activity. Activity is now tracked by pushing it to the Hub from user servers instead of polling the proxy API. Dynamic options_form callables may now return an empty string which will result in no options form being rendered. Spawner.user_options is persisted to the database to be re-used so that a server spawned once via the form can be re-spawned via the API with the same options. c.PAMAuthenticator.pam_normalize_username, option is added for round-tripping usernames through PAM to retrieve the normalized form. c.JupyterHub.named_server_limit_per_user configuration is added to limit the number of named servers each user can have. The default is 0, for no limit. API requests to HubAuthenticated services (e.g. single-user servers) may pass a token in the Authorization header, matching authentication with the Hub API itself. Authenticator.is_admin(handler, authentication) method and Authenticator.admin_groups configuration is added for automatically determining that a member of a group should be considered an admin. These are just a select few updates. For the full list of new features and improvements in JupyterHub 1.0, visit the changelog. You can upgrade jupyterhub with conda or pip: conda install -c conda-forge jupyterhub==1.0.* pip install --upgrade jupyterhub==1.0.* Users were quite excited about the release. Here are some comments from a Hacker News thread. “This is really cool and I’m impressed by the jupyter team. My favorite part is that it’s such a good product that beats the commercial products because it’s hard to figure out, I think, commercial models that support this wide range of collaborators (people who view once a month to people who author every day).” “Congratulations! JupyterHub is a great project with high-quality code and docs. Looking forward to trying the named servers feature as I run a JupyterHub instance that spawns servers inside containers based on a single image which inevitably tends to grow as I add libraries. Being able to manage multiple servers should allow me to split the image into smaller specialized images.” Introducing Jupytext: Jupyter notebooks as Markdown documents, Julia, Python or R scripts How everyone at Netflix uses Jupyter notebooks from data scientists, machine learning engineers, to data analysts. 10 reasons why data scientists love Jupyter notebooks

0
0
20446

How-To Tutorials - News

A new study reveals how shopping websites use ‘dark patterns’ to deceive you into buying things you may not want

DataCamp reckons with its #MeToo movement; CEO steps down from his role indefinitely

Can a modified MIT ‘Hippocratic License’ to restrict misuse of open source software prompt a wave of ethical innovation in tech?

Amazon joins NSF in funding research exploring fairness in AI amidst public outcry over big tech #ethicswashing

Why Intel is betting on BFLOAT16 to be a game changer for deep learning training? Hint: Range trumps Precision.

PostgreSQL 12 Beta 1 released

Amazon re:MARS Day 1 kicks off showcasing Amazon’s next-gen AI robots; Spot, the robo-dog and a guest appearance from ‘Iron Man’

Microsoft open sources Infer.NET, it’s popular model-based machine learning framework

Julia Computing research team runs machine learning model on encrypted data without decrypting it

Speech2Face: A neural network that “imagines” faces from hearing voices. Is it too soon to worry about ethnic profiling?

Trending Topics

PyTorch 1.0 preview release is production ready with torch.jit, c10d distributed library, C++ API

Austrian Supreme Court rejects Facebook’s bid to stop a GDPR-violation lawsuit against it by privacy activist, Max Schrems

EPIC’s Public Voice Coalition announces Universal Guidelines for Artificial Intelligence (UGAI) at ICDPPC 2018

DeepCube: A new deep reinforcement learning approach solves the Rubik’s cube with no human help

JupyterHub 1.0 releases with named servers, support for TLS encryption and more

Create a Free Account To Continue Reading

SignIn Free Account To Continue Reading