Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7010 Articles
article-image-equifax-data-breach-could-have-been-entirely-preventable-says-house-oversight-and-government-reform-committee-staff-report
Savia Lobo
11 Dec 2018
5 min read
Save for later

Equifax data breach could have been “entirely preventable”, says House oversight and government reform committee staff report

Savia Lobo
11 Dec 2018
5 min read
Update: On July 22, 2019, Equifax announced a global settlement including up to $425 million to help people affected by the data breach.  Two days back, the House Oversight and Government Reform Committee released a staff report on Equifax’s data breach that affected 143 million U.S. consumers on September 7, 2017, which could have been "entirely preventable”. On September 14, 2017, the Committee opened an investigation into the Equifax data breach. After the 14-month-long investigation, the staff report highlights the circumstances of the cyber attack, which compromised the authenticating details, such as dates of birth, and social security numbers, of more than half of American consumers. In August 2017, three weeks before Equifax publicly announced the breach, Richard Smith, the former CEO of Equifax, boasted that the company was managing “almost 1,200 times” the amount of data held in the Library of Congress every day. However, Equifax failed to implement an adequate security program to protect this sensitive data. As a result, Equifax allowed one of the largest data breaches in U.S. history. The loopholes that led to a massive data breach Equifax had serious gaps between IT policy development and execution According to the Committee, Equifax failed to implement clear lines of authority within their internal IT management structure. This led to an execution gap between IT policy development and operation. Thus, the gap restricted the company’s ability to implement security initiatives in a comprehensive and timely manner. On March 7, 2017, a critical vulnerability in the Apache Struts software was publicly disclosed. Equifax used Apache Struts to run certain applications on legacy operating systems. The following day, the Department of Homeland Security alerted Equifax to this critical vulnerability. Equifax’s Global Threat and Vulnerability Management (GTVM) team emailed this alert to over 400 people on March 9, instructing anyone who had Apache Struts running on their system to apply the necessary patch within 48 hours. The Equifax GTVM team also held a meeting on March 16 about this vulnerability. Equifax, however, did not fully patch its systems. Equifax’s Automated Consumer Interview System (ACIS), a custom-built internet-facing consumer dispute portal developed in the 1970s, was running a version of Apache Struts containing the vulnerability. Equifax did not patch the Apache Struts software located within ACIS, leaving its systems and data exposed. Equifax had complex and outdated IT systems Equifax’s aggressive growth strategy led to the acquisition of multiple companies, information technology (IT) systems, and data. The acquisition strategy may have been successful for the company’s bottom line and stock price, but this growth also brought increasing complexity to Equifax’s IT systems and expanded data security risk. Both the complexity and antiquated nature of Equifax’s custom-built legacy systems made IT security especially challenging. The company failed to implement responsible security measurements Per the committee, Equifax knew of the potential security risks posed by expired SSL certificates. An internal vulnerability assessment tracker entry dated January 20, 2017, stated “SSLV devices are missing certificates, limiting visibility to web-based attacks on [intrusion prevention system]”. Despite this, the company had allowed over 300 security certificates to expire, including 79 certificates for monitoring business-critical domains. Had Equifax implemented a certificate management process with defined roles and responsibilities, the SSL certificate on the device monitoring the ACIS platform would have been active when the intrusion began on May 13, 2017. The company would have been able to see the suspicious traffic to and from the ACIS platform much earlier – potentially mitigating or preventing the data breach. On August 30, 2018, GAO (U.S. Government Accountability Office) published a report detailing Equifax’s information security remediation activities to date. According to GAO, “ a misconfigured monitoring device allowed encrypted web traffic to go uninspected through the Equifax network. To prevent this from happening again, GAO reported Equifax developed new policies and implemented new tools to ensure network traffic is monitored continuously.” In its 2018 Annual Proxy Statement to investors, Equifax reported on how its Board of Directors was enhancing Board oversight in an effort to strengthen Equifax’s cybersecurity posture. Equifax’s new CEO, Mark Begor told news outlets, “We didn’t have the right defenses in place, but we are investing in the business to protect this from ever happening again.” To know more about this news in detail, read the complete Equifax Data Breach report. Affected users can file now file a claim On July 24, 2019, Equifax announced a settlement of up to $425 million to help people affected by its data breach. This global settlement was done with the Federal Trade Commission, the Consumer Financial Protection Bureau, and 50 U.S. states and territories.  Users whose personal information was exposed in the Equifax data breach can now file a claim on Equifax breach settlement website. For those who are unsure if their data was exposed can find out using the Eligibility tool. To know about the benefits a user would receive on this claim, read FTC’s official blog post. A new data breach on Facebook due to malicious browser extensions allowed almost 81,000 users’ private data up for sale, reports BBC News Uber fined by British ICO and Dutch DPA for nearly $1.2m over a data breach from 2016 Marriott’s Starwood guest database faces a massive data breach affecting 500 million user data
Read more
  • 0
  • 0
  • 20024

article-image-australias-accc-publishes-a-preliminary-report-recommending-google-facebook-be-regulated-and-monitored-for-discriminatory-and-anti-competitive-behavior
Sugandha Lahoti
10 Dec 2018
5 min read
Save for later

Australia’s ACCC publishes a preliminary report recommending Google Facebook be regulated and monitored for discriminatory and anti-competitive behavior

Sugandha Lahoti
10 Dec 2018
5 min read
The Australian competition and consumer commission (ACCC) have today published a 378-page preliminary report to make the Australian government and the public aware of the impact of social media and digital platforms on targeted advertising and user data collection. The report also highlights the ACCC's concerns regarding the “market power held by these key platforms, including their impact on Australian businesses and, in particular, on the ability of media businesses to monetize their content.” This report was published following an investigation when ACCC Treasurer Scott Morrison MP had asked the ACCC, late last year, to hold an inquiry into how online search engines, social media, and digital platforms impact media and advertising services markets. The inquiry demanded answers on the range and reliability of news available via Google and Facebook. The ACCC also expressed concerns on the large amount and variety of data which Google and Facebook collect on Australian consumers, which users are not actively willing to provide. Why did ACCC choose Google and Facebook? Google and Facebook are the two largest digital platforms in Australia and are the most visited websites in Australia. Google and Facebook also have similar business models, as they both rely on consumer attention and data to sell advertising opportunities and also have substantial market power. Per the report, each month, approximately 19 million Australians use Google Search, 17 million access Facebook, 17 million watch YouTube (which is owned by Google) and 11 million access Instagram (which is owned by Facebook). This widespread and frequent use of Google and Facebook means that these platforms occupy a key position for businesses looking to reach Australian consumers, including advertisers and news media businesses. Recommendations made by the ACCC The report contains 11 preliminary recommendations to these digital platforms and eight areas for further analysis. Per the report: #1 The ACCC wants to amend the merger law to make it clearer that the following are relevant factors: the likelihood that an acquisition would result in the removal of a potential competitor, and the amount and nature of data which the acquirer would likely have access to as a result of the acquisition. #2 ACCC wants Facebook and Google to provide advance notice of the acquisition of any business with activities in Australia and to provide sufficient time to enable a thorough review of the likely competitive effects of the proposed acquisition. #3 ACCC wants suppliers of operating systems for mobile devices, computers, and tablets to provide consumers with options for internet browsers and search engines (rather than providing a default). #4 The ACCC wants a regulatory authority to monitor, investigate and report on whether digital platforms are engaging in discriminatory conduct by favoring their own business interests above those of advertisers or potentially competing businesses. #5 The regulatory authority should also monitor, investigate and report on the ranking of news and journalistic content by digital platforms and the provision of referral services to news media businesses. #6 The ACCC wants the government to conduct a separate, independent review to design a regulatory framework to regulate the conduct of all news and journalistic content entities in Australia. This framework should focus on underlying principles, the extent of regulation, content rules, and enforcement. #7 Per ACCC, the ACMA (Australian Communications and Media Authority) should adopt a mandatory standard regarding take-down procedures for copyright infringing content. #8 ACCC proposes amendments to the Privacy Act. These include: Strengthen notification requirements Introduce an independent third-party certification scheme Strengthen consent requirements Enable the erasure of personal information Increase the penalties for breach of the Privacy Act Introduce direct rights of action for individuals Expand resourcing for the OAIC (Office of the Australian Information Commissioner) to support further enforcement activities #9 The ACCC wants OAIC to develop a code of practice under Part IIIB of the Privacy Act to provide Australians with greater transparency and control over how their personal information is collected, used and disclosed by digital platforms. #10 Per ACCC, the Australian government should adopt the Australian Law Reform Commission’s recommendation to introduce a statutory cause of action for serious invasions of privacy. #11 Per the ACCC, unfair contract terms should be illegal (not just voidable) under the Australian Consumer Law “The inquiry has also uncovered some concerns that certain digital platforms have breached competition or consumer laws, and the ACCC is currently investigating five such allegations to determine if enforcement action is warranted,” ACCC Chair Rod Sims said. The ACCC is also seeking feedback on its preliminary recommendations and the eight proposed areas for further analysis and assessment. Feedback can be shared by email to platforminquiry@accc.gov.au by 15 February 2019. AI Now Institute releases Current State of AI 2018 Report Australia passes a rushed anti-encryption bill “to make Australians safe”; experts find “dangerous loopholes” that compromise online privacy and safety Australia’s Facial recognition and identity system can have “chilling effect on freedoms of political discussion, the right to protest and the right to dissent”: The Guardian report
Read more
  • 0
  • 0
  • 14705

article-image-npm-javascript-predictions-for-2019-react-graphql-and-typescript-are-three-technologies-to-learn
Bhagyashree R
10 Dec 2018
3 min read
Save for later

npm JavaScript predictions for 2019: React, GraphQL, and TypeScript are three technologies to learn

Bhagyashree R
10 Dec 2018
3 min read
Based on Laurie Voss’ talk on Node+JS Interactive 2018, on Friday, npm has shared some insights and predictions about JavaScript for 2019. These predictions are aimed to help developers make better technical choices in 2019. Here are the four predictions npm has made: “You will abandon one of your current tools.” In JavaScript, frameworks and tools don’t last and generally enjoy a phase of peak popularity of 3-5 years. This follows a slow decline as developers have to maintain the legacy applications but move to newer frameworks for new work. Mr. Voss said in his talk, “Nothing lasts forever!..Any framework that we see today will have its hay days and then it will have an after-life where it will slowly slowly degrade.” For developers, this essentially means that it is better to keep on learning new frameworks instead of holding on to their current tools too tightly. “Despite a slowdown in growth, React will be the dominant framework in 2019.” Though React’s growth has slowed down in 2018, as compared to 2017, it still continues to dominate the web scene. 60% of npm survey respondents said they are using React. In 2019, npm says that more people will use React for building web applications. As people using it will grow we will have more tutorials, advice, and bug fixes. “You’ll need to learn GraphQL.” The GraphQL client library is showing tremendous popularity and as per npm it is going to be a “technical force to reckon with in 2019.” It was first publicly released in 2015 and it is still too early to put it into production, but going by its growing popularity, developers are recommended to learn its concepts in 2019. npm also predict that developers will see themselves using GraphQL in new projects later in the year and in 2020. “Somebody on your team will bring in TypeScript.” npm’s survey uncovered that 46% of the respondents were using Microsoft’s TypeScript, a typed superset of JavaScript that compiles to plain JavaScript. One of the reason for this major adoption by enthusiasts could be the extra safety TypeScript provides by type-checking. Adopting TypeScript in 2019 could prove really useful, especially if you’re a member of a larger team. Read the detailed report and predictions on npm’s website. 4 key findings from The State of JavaScript 2018 developer survey TypeScript 3.2 released with configuration inheritance and more 7 reasons to choose GraphQL APIs over REST for building your APIs
Read more
  • 0
  • 0
  • 40300

article-image-accountability-and-algorithmic-bias-why-diversity-and-inclusion-matters-neurips-invited-talk
Sugandha Lahoti
08 Dec 2018
4 min read
Save for later

Accountability and algorithmic bias: Why diversity and inclusion matters [NeurIPS Invited Talk]

Sugandha Lahoti
08 Dec 2018
4 min read
One of the most awaited machine learning conference, NeurIPS 2018 is happening throughout this week in Montreal, Canada. It will feature a series of tutorials, invited talks, product releases, demonstrations, presentations, and announcements related to machine learning research. For the first time, NeurIPS invited a diversity and inclusion (D&I) speaker Laura Gomez to talk about the lack of diversity in the tech industry, which leads to biased algorithms, faulty products, and unethical tech. Laura Gomez is the CEO of Atipica that helps tech companies find and hire diverse candidates. Being a Latina woman herself, she had to face oppression when seeking capital and funds for her startup trying to establish herself in Silicon Valley. This experience led to her realization that there is a strong need to talk about why diversity and inclusion matters. Her efforts were not in vain and recently, she raised $2M in seed funding led by True Ventures. “At Atipica, we think of Inclusive AI in terms of data science, algorithms, and their ethical implications. This way you can rest assure our models are not replicating the biases of humans that hinder diversity while getting patent-pending aggregate demographic insights of your talent pool,” reads the website. She talks about her journey as a Latina woman in the tech industry. She reminisced on how she was the only one like her who got an internship with Hewlett Packard and the fact that she hated it. Nevertheless, she still decided to stay, determined not to let the industry turn her into a victim. She believes she made the right choice going forward with tech; now, years later, diversity is dominating the conversation in the industry. After HP, she also worked at Twitter and YouTube, helping them translate and localize their applications for a global audience. She is also a founding advisor of Project Include, which is a non-profit organization run by women, that uses data and advocacy to accelerate diversity and inclusion solutions in the tech industry. She opened her talk by agreeing to a quote from Safiya Noble, who wrote Algorithms of Oppression. “Artificial Intelligence will become a major human rights issue in the twenty-first century.” She believes we need to talk about difficult questions such as where AI is heading? And where should we hold ourselves and each other accountable.” She urges people to evaluate their role in AI, bias, and inclusion, to find the empathy and value in difficult conversations, and to go beyond your immediate surroundings to consider the broader consequences. It is important to build accountable AI in a way that allows humanity to triumph. She touched upon discriminatory moves by tech giants like Amazon and Google. Amazon recently killed off its AI recruitment tool because it couldn’t stop discriminating against women. She also criticized upon Facebook’s Myanmar operation where Facebook data scientists were building algorithms for hate speech. They didn’t understand the importance of localization or language or actually internationalize their own algorithms to be inclusive towards all the countries. She also talked about algorithmic bias in library discovery systems, as well as how even ‘black robots’ are being impacted by racism. She also condemned Palmer Luckey's work who is helping U.S. immigration agents on the border wall identify Latin refugees. Finally, she urged people to take three major steps to progress towards being inclusive: Be an ally Think of inclusion as an approach, not a feature Work towards an Ethical AI Head over to NeurIPS facebook page for the entire talk and other sessions happening at the conference this week. NeurIPS 2018: Deep learning experts discuss how to build adversarially robust machine learning models NeurIPS 2018 paper: DeepMind researchers explore autoregressive discrete autoencoders (ADAs) to model music in raw audio at scale NeurIPS 2018: A quick look at data visualization for Machine learning by Google PAIR researchers [Tutorial]
Read more
  • 0
  • 0
  • 18166

article-image-ai-now-institute-releases-current-state-of-ai-2018-report
Natasha Mathur
07 Dec 2018
7 min read
Save for later

AI Now Institute releases Current State of AI 2018 Report

Natasha Mathur
07 Dec 2018
7 min read
The AI Now Institute, New York University, released its third annual report on the current state of AI, yesterday.  2018 AI Now Report focused on themes such as industry AI scandals, and rising inequality. It also assesses the gaps between AI ethics and meaningful accountability, as well as looks at the role of organizing and regulation in AI. Let’s have a look at key recommendations from the AI Now 2018 report. Key Takeaways Need for a sector-specific approach to AI governance and regulation This year’s report reflects on the need for stronger AI regulations by expanding the powers of sector-specific agencies (such as United States Federal Aviation Administration and the National Highway Traffic Safety Administration) to audit and monitor these technologies based on domains. Development of AI systems is rising and there aren’t adequate governance, oversight, or accountability regimes to make sure that these systems abide by the ethics of AI. The report states how general AI standards and certification models can’t meet the expertise requirements for different sectors such as health, education, welfare, etc, which is a key requirement for enhanced regulation. “We need a sector-specific approach that does not prioritize the technology but focuses on its application within a given domain”, reads the report. Need for tighter regulation of Facial recognition AI systems Concerns are growing over facial recognition technology as they’re causing privacy infringement, mass surveillance, racial discrimination, and other issues. As per the report, stringent regulation laws are needed that demands stronger oversight, public transparency, and clear limitations. Moreover, only providing public notice shouldn’t be the only criteria for companies to apply these technologies. There needs to be a “high threshold” for consent, keeping in mind the risks and dangers of mass surveillance technologies. The report highlights how “affect recognition”, a subclass of facial recognition that claims to be capable of detecting personality, inner feelings, mental health, etc, depending on images or video of faces, needs to get special attention, as it is unregulated. It states how these claims do not have sufficient evidence behind them and are being abused in unethical and irresponsible ways.“Linking affect recognition to hiring, access to insurance, education, and policing creates deeply concerning risks, at both an individual and societal level”, reads the report. It seems like progress is being made on this front, as it was just yesterday when Microsoft recommended that tech companies need to publish documents explaining the technology’s capabilities, limitations, and consequences in case their facial recognition systems get used in public. New approaches needed for governance in AI The report points out that internal governance structures at technology companies are not able to implement accountability effectively for AI systems. “Government regulation is an important component, but leading companies in the AI industry also need internal accountability structures that go beyond ethics guidelines”, reads the report.  This includes rank-and-file employee representation on the board of directors, external ethics advisory boards, along with independent monitoring and transparency efforts. Need to waive trade secrecy and other legal claims The report states that Vendors and developers creating AI and automated decision systems for use in government should agree to waive any trade secrecy or other legal claims that would restrict the public from full auditing and understanding of their software. As per the report, Corporate secrecy laws are a barrier as they make it hard to analyze bias, contest decisions, or remedy errors. Companies wanting to use these technologies in the public sector should demand the vendors to waive these claims before coming to an agreement. Companies should protect workers from raising ethical concerns It has become common for employees to organize and resist technology to promote accountability and ethical decision making. It is the responsibility of these tech companies to protect their workers’ ability to organize, whistleblow, and promote ethical choices regarding their projects. “This should include clear policies accommodating and protecting conscientious objectors, ensuring workers the right to know what they are working on, and the ability to abstain from such work without retaliation or retribution”, reads the report. Need for more in truth in advertising of AI products The report highlights that the hype around AI has led to a gap between marketing promises and actual product performance, causing risks to both individuals and commercial customers. As per the report, AI vendors should be held to high standards when it comes to them making promises, especially when there isn’t enough information on the consequences and the scientific evidence behind these promises. Need to address exclusion and discrimination within the workplace The report states that the Technology companies and the AI field focus on the “pipeline model,” that aims to train and hire more employees. However, it is important for tech companies to assess the deeper issues such as harassment on the basis of gender, race, etc, within workplaces. They should also examine the relationship between exclusionary cultures and the products they build, so to build tools that do not perpetuate bias and discrimination. Detailed account of the “full stack supply chain” As per the report, there is a need to better understand the parts of an AI system and the full supply chain on which it relies for better accountability. “This means it is important to account for the origins and use of training data, test data, models, the application program interfaces (APIs), and other components over a product lifecycle”, reads the paper. This process is called accounting for the ‘full stack supply chain’ of AI systems, which is necessary for a more responsible form of auditing. The full stack supply chain takes into consideration the true environmental and labor costs of AI systems. This includes energy use, labor use for content moderation and training data creation, and reliance on workers for maintenance of AI systems. More funding and support for litigation, and labor organizing on AI issues The report states that there is a need for increased support for legal redress and civic participation. This includes offering support to public advocates representing people who have been exempted from social services because of algorithmic decision making, civil society organizations and labor organizers who support the groups facing dangers of job loss and exploitation. Need for University AI programs to expand beyond computer science discipline The report states that there is a need for university programs and syllabus to expand its disciplinary orientation. This means the inclusion of social and humanistic disciplines within the universities AI programs. For AI efforts to truly make social impacts, it is necessary to train the faculty and students within the computer science departments, to research the social world. A lot of people have already started to implement this, for instance, Mitchell Baker, chairwoman, and co-founder of Mozilla talked about the need for the tech industry to expand beyond the technical skills by bringing in humanities. “Expanding the disciplinary orientation of AI research will ensure deeper attention to social contexts, and more focus on potential hazards when these systems are applied to human populations”, reads the paper. For more coverage, check out the official AI Now 2018 report. Unity introduces guiding Principles for ethical AI to promote responsible use of AI Teaching AI ethics – Trick or Treat? Sex robots, artificial intelligence, and ethics: How desire shapes and is shaped by algorithms
Read more
  • 0
  • 0
  • 17718

article-image-implementing-color-and-shape-based-object-detection-and-tracking-with-opencv-and-cuda-tutorial
Bhagyashree R
07 Dec 2018
12 min read
Save for later

Implementing color and shape-based object detection and tracking with OpenCV and CUDA [Tutorial]

Bhagyashree R
07 Dec 2018
12 min read
Object detection and tracking is an active research topic in the field of computer vision that makes efforts to detect, recognize, and track objects through a series of frames. It has been found that object detection and tracking in the video sequence is a challenging task and a very time-consuming process. Object detection is the first step in building a larger computer vision system.  Object tracking is defined as the task of detecting objects in every frame of the video and establishing the correspondence between the detected objects from one frame to the other. [box type="shadow" align="" class="" width=""]This article is an excerpt from a book written by Bhaumik Vaidya titled Hands-On GPU-Accelerated Computer Vision with OpenCV and CUDA. This book provides a detailed overview of integrating OpenCV with CUDA for practical applications. You will learn GPU programming with CUDA, explore OpenCV acceleration with GPUs and CUDA, and learn how to deploy OpenCV applications on NVIDIA Jetson TX1. To follow along with the article, you can find the code in the book’s GitHub repository. Check out the video to see the code in action.[/box] In this article, we will see how to develop complex computer vision applications with OpenCV and CUDA. We will use the example of object detection and tracking to demonstrate the concepts. We will start with an explanation of detecting an object based on color, then describe the methods to detect an object with a particular shape. Object detection and tracking based on color An object has many global features like color and shape, which describe the object as a whole. These features can be utilized for the detection of an object and tracking it in a sequence of frames. In this section, we will use color as a feature to detect an object with a particular color. This method is useful when an object to be detected is of a specific color and this color is different from the color of the background. If the object and background have the same color, then this method for detection will fail. We will try to detect any object with a blue color from a webcam stream using OpenCV and CUDA. Blue object detection and tracking The first question that should come to your mind is which color space should be used for segmenting blue color. A Red Green Blue (RGB) color space does not separate color information from intensity information. The color spaces that separate color information from the intensity, like Hue Saturation Value (HSV) and YCrCb (where Y′ is the luma component and CB and CR are the blue-difference and red-difference chroma components), are ideal for this kind of task. Every color has a specific range in the hue channel that can be utilized for detection of that color. Find the code for this section on GitHub. The boilerplate code for starting the webcam, capturing frames, and uploading on-device memory for a GPU operation is as follows: To detect the blue color, we need to find a range for blue color in the HSV color space. If a range is accurate then the detection will be accurate. The range of blue color for three channels, hue, saturation, and value, is as follows: This range will be used to threshold an image in a particular channel to create a mask for the blue color. If this mask is again ANDed with the original frame, then only a blue object will be there in the resultant image. The code for this is as follows: The frame from the webcam is converted to an HSV color space. The blue color has a different range in three channels, so each channel has to be thresholded individually. The channels are split using the split method and thresholded using the threshold function. The minimum and maximum ranges for each channel are used as lower and upper thresholds. The channel value inside this range will be converted to white and others are converted to black. These three thresholded channels are logically ANDed to get a final mask for a blue color. This mask can be used to detect and track an object with a blue color from a video. The output of two frames, one without the blue object and the other with the blue object, is as follows: As can be seen from the result, when a frame does not contain any blue object, the mask is almost black; whereas in the frame below, when the blue object comes into frame, that part turns white. This method will only work when the background does not contain the color of an object. Object detection and tracking based on a shape The shape of an object can also be utilized as a global feature to detect an object with a distinct shape. This shape can be a straight line, polygons, circles, or any other irregular shapes. Object boundaries, edges, and contours can be utilized to detect an object with a particular shape. In this section, we will use the Canny edge detection algorithm and Hough transform to detect two regular shapes, which are a line and a circle. Canny edge detection In this section, we will implement the Canny edge detection algorithm using OpenCV and CUDA. This algorithm combines Gaussian filtering, gradient finding, non-maximum suppression, and hysteresis thresholding. High pass filters are very sensitive to noise. In Canny edge detection, Gaussian smoothing is done before detecting edges, which makes it less sensitive to noises. It also has a non-maximum suppression stage after detecting edges to remove unnecessary edges from the result. Find the code for this section on GitHub. Canny edge detection is a computationally intensive task, which is hard to use in real-time applications. The CUDA version of the algorithm can be used to accelerate it. The code for implementing a Canny edge detection algorithm is described below: OpenCV and CUDA provide the createCannyEdgeDetector class for Canny edge detection. The object of this class is created, and many arguments can be passed while creating it. The first and second arguments are the low and high thresholds for hysteresis thresholding. If the intensity gradient at a point is greater then the maximum threshold, then it is categorized as an edge point. If the gradient is less than the low threshold, then the point is not an edge point. If the gradient is in between thresholds, then whether the point is an edge or not is decided based on connectivity. The third argument is the aperture size for the edge detector. The final argument is the Boolean argument, which indicates whether to use L2_norm or L1_norm for gradient magnitude calculation.  L2_norm is computationally expensive but it is more accurate. The true value indicates the use of L2_norm. The output of the code is shown below: Straight-line detection using Hough transform The detection of straight lines is important in many computer vision applications, like lane detection. It can also be used to detect lines that are part of other regular shapes. Hough transform is a popular feature extraction technique used in computer vision to detect straight lines. Find the code for this section on GitHub. We will not go into detail about how Hough transform detects lines, but we will see how it can be implemented in OpenCV and CUDA. The code for implementing Hough transform for line detection is as follows: OpenCV provides the createHoughSegmentDetector class for implementing Hough transform. It needs an edge map of an image as input. So edges are detected from an image using a Canny edge detector.  The output of the Canny edge detector is uploaded to the device memory for GPU computation and the edges can also be computed on GPU. The object of createHoughSegmentDetector is created. It requires many arguments. The first argument indicates the resolution of parameter r used in Hough transform, which is taken as 1 pixel normally. The second argument is the resolution of parameter theta in radians, which is taken as 1 radian or pi/180.  The third argument is the minimum number of points that are needed to form a line, which is taken as 50 pixels. The final argument is the maximum gap between two points to be considered as the same line, which is taken as 5 pixels. The detect method of the created object is used to detect straight lines. It needs two arguments. The first argument is the image on which the edges are to be detected, and the second argument is the array in which detected line points will be stored. The array contains the starting and ending (x,y) points of the detected lines.  This array is iterated using the for loop to draw individual lines on an image using the line function from OpenCV.  The final image is displayed using the imshow function. Hough transform is a mathematically intensive step. Just to show an advantage of CUDA, we will implement the same algorithm for CPU and compare the performance of it with a CUDA implementation. The CPU code for Hough transform is as follows: The HoughLinesP function is used for detecting lines on a CPU using probabilistic Hough transform. The first two arguments are the source image and the array to store output line points. The third and fourth arguments are a resolution for r and theta. The fifth argument is the threshold that indicates the minimum number of intersection points for a line. The sixth argument indicates the minimum number of points needed to form a line. The last argument indicates the maximum gap between points to be considered on the same line. The array returned by the function is iterated using the for loop for displaying detected lines on the original image. The output for both the GPU and CPU function is as  follows: The comparison between the performance of the GPU and CPU code for the Hough transform is shown in the following screenshot: It takes around 4 ms for a single image to process on the CPU and 1.5 ms on the GPU, which is equivalent to 248 FPS on the CPU, and 632 FPS on the GPU, which is almost 2.5 times an improvement on the GPU. Circle detection Hough transform can also be used for circle detection. It can be used in many applications, like ball detection and tracking and coin detection, and so on, where objects are circular. OpenCV and CUDA provide a class to implement this. Find the code for this section on GitHub. The code for coin detection using Hough transform is as follows: There is a createHoughCirclesDetector class for detecting the circular object. The object of that class is created. Many arguments can be provided while creating an object of this class. The first argument is dp that signifies an inverse ratio of the accumulator resolution to the image resolution, which is mostly taken as 1. The second argument is the minimum distance between the centers of the detected circle. The third argument is a Canny threshold and the fourth argument is the accumulator threshold. The fifth and sixth arguments are the minimum and maximum radiuses of the circles to be detected. The minimum distance between the centers of the circle is taken as 100 pixels. You can play around with this value. If this is decreased, then many circles are detected falsely on the original image, while if it is increased then some true circles may be missed. The last two arguments, which are the minimum and maximum radiuses, can be taken as 0 if you don't know the exact dimension. In the preceding code, it is taken as 1 and maximum dimension of an image to detect all circles in an image. The output of the program is as follows: The Hough transform is very sensitive to Gaussian and salt-pepper noise. So, sometimes it is better to preprocess the image with Gaussian and median filters before applying Hough transform. It will give more accurate results. To summarize, we have used the Hough line and circle transforms to detect objects with regular shapes. Contours and convexity can also be used for shape detection. The functions for this are available in OpenCV, but they are not available with CUDA implementation. You will have to develop your own versions of these functions. Color-based object detection is easier to implement, but it requires that the object should have a distinct color from the background. For shape-based object detection, the Canny edge detection technique has been described to detect edges, and the Hough transform has been described for straight line and circle detection. It has many applications, such as land detection, ball tracking, and so on. The color and shape are global features, which are easier to compute and require less memory. They are more susceptible to noise. If you found this post useful, do check out the book, Hands-On GPU-Accelerated Computer Vision with OpenCV and CUDA. This book helps you discover how CUDA allows OpenCV to handle complex and rapidly growing image data processing in computer and machine vision by accessing the power of GPU. Using machine learning for phishing domain detection [Tutorial] Implementing face detection using the Haar Cascades and AdaBoost algorithm OpenCV 4.0 releases with experimental Vulcan, G-API module and QR-code detector among others
Read more
  • 0
  • 0
  • 54222
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-how-neurips-2018-is-taking-on-its-diversity-and-inclusion-challenges
Sugandha Lahoti
06 Dec 2018
3 min read
Save for later

How NeurIPS 2018 is taking on its diversity and inclusion challenges

Sugandha Lahoti
06 Dec 2018
3 min read
This year the Neural Information Processing Systems Conference is asking serious questions to improve diversity, equity, and inclusion at NeurIPS. “Our goal is to make the conference as welcoming as possible to all.” said the heads of the new diversity and inclusion chairs introduced this year. https://twitter.com/InclusionInML/status/1069987079285809152 The Diversity and Inclusion chairs were headed by Hal Daume III, a professor from the University of Maryland and machine learning and fairness groups researcher at Microsoft Research and Katherine Heller, assistant professor at Duke University and research scientist at Google Brain. They opened up the talk by acknowledging the respective privilege that they get as a group of white man and woman and the fact that they don’t reflect the diversity of experience in the conference room, much less the world. They talk about the three major goals with respect to inclusion at NeurIPS: Learn about the challenges that their colleagues have faced. Support those doing the hard work of amplifying the voices of those who have been historically excluded. To begin structural changes that will positively impact the community over the coming years. They urged attendees to start building an environment where everyone can do their best work. They want people to: see other perspectives remember the feeling of being an outsider listen, do research and learn. make an effort and speak up Concrete actions taken by the NeurIPS diversity and inclusion chairs This year they have assembled an advisory board and run a demographics and inclusion survey. They have also conducted events such as WIML (Women in Machine Learning), Black in AI, LatinX in AI, and Queer in AI. They have established childcare subsidies and other activities in collaboration with Google and DeepMind to support all families attending NeurIPS by offering a stipend of up to $100 USD per day. They have revised their Code of Conduct, to provide an experience for all participants that is free from harassment, bullying, discrimination, and retaliation. They have added inclusion tips on Twitter offering tips and bits of advice related to D&I efforts. The conference also offers pronoun stickers (only them and they), first-time attendee stickers, and information for participant needs. They have also made significant infrastructure improvements for visa handling. They had discussions with people handling visas on location, sent out early invitation letters for visas, and are choosing future locations with visa processing in mind. In the future, they are also looking to establish a legal team for details around Code of Conduct. Further, they are looking to improve institutional structural changes that support the community, and improve the coordination around affinity groups & workshops. For the first time, NeurIPS also invited a diversity and inclusion (D&I) speaker Laura Gomez to talk about the lack of diversity in the tech industry, which leads to biased algorithms, faulty products, and unethical tech. Head over to NeurIPS website for interesting tutorials, invited talks, product releases, demonstrations, presentations, and announcements. NeurIPS 2018: Deep learning experts discuss how to build adversarially robust machine learning models NeurIPS 2018 paper: DeepMind researchers explore autoregressive discrete autoencoders (ADAs) to model music in raw audio at scale NeurIPS 2018: A quick look at data visualization for Machine learning by Google PAIR researchers [Tutorial]
Read more
  • 0
  • 0
  • 17690

article-image-neurips-2018-a-quick-look-at-data-visualization-for-machine-learning-by-google-pair-researchers-tutorial
Natasha Mathur
05 Dec 2018
9 min read
Save for later

NeurIPS 2018: A quick look at data visualization for Machine learning by Google PAIR researchers [Tutorial]

Natasha Mathur
05 Dec 2018
9 min read
The 32nd annual NeurIPS (Neural Information Processing Systems) Conference 2018 (formerly known as NIPS), is currently being hosted in Montreal, Canada this week. The Conference is the biggest machine learning conference of the year that started on 2nd December and will be ending on 8th December. It will feature a series of tutorials, invited talks, product releases, demonstrations, presentations, and announcements related to machine learning research. One such tutorial was presented at NeurIPS, earlier this week, called “Visualization for machine learning” by Fernanda Viegas and Martin Wattenberg. Viegas and Wattenberg are co-leads at Google’s PAIR (People in AI research ) initiative, which is a part of Google Brain. Their work in machine learning focuses on transparency and interpretability to improve human AI interaction and to democratize AI technology. Here are some key highlights from the tutorial. The tutorial talks about how visualization works, and explores common visualization techniques, and uses of visualization in Machine learning. Viegas opened the talk with first explaining the data visualization concept. Data visualization refers to a process of representing and transforming data into visual encodings and context. It is used for data exploration, for gaining scientific insight, and for better communication of data results. How does data visualization work? Data visualization works by “finding visual encodings”. In other words, you take data and then transform it into visual encodings. These encodings then further perform a bunch of different functions. Firstly, they help guide viewers attention through data. Viegas explains how if our brains are given “the right kind of visual stimuli”, our visual system works dramatically faster. There are certain things that human visual systems are acutely aware of such as differences in shapes, alignments, colors, sizes, etc. Secondly, they communicate the data effectively to the viewer, and thirdly, it allows the viewer to calculate data. Once these functions are complete, you can then interactively explore the data on the computer. Wattenberg explains how different encodings have different properties. For instance, “position” and “length” properties are as good as a text for communicating exact values within data. “Area” and “colors” are good for drawing the attention of the viewer. He further gives an example of Colorbrewer, a color advice tool by Cynthia Brewer, a cartographer, that lets you try out different color palettes and scales. It’s a handy tool when playing with colors for data visualization. Apart from that, a trick to keep in mind when choosing colors for data visualization is to go for a color palette or scale where one color doesn’t look more prominent than the others since it can be perceived as one category that is more important than the other, says Viegas. Common visualization Techniques Data Density Viegas explains how when you have a lot of data, there is something called small multiples, meaning that you “use your chart over and over again for each moment that is important”. A visualization example presented by Viegas is that of a New York Times infographic for Drought, in the US over the decades. Visualization for machine learning She explains how in the above visualization, each one of the rows is a decade worth of drought in the US. Another thing to notice is that the background color in the visualization is very faint so that the map of the US recedes in the background, points out Viegas. This is because the map is not the most important thing. The drought information is what needs to majorly pop out. Hence, a sharp highlighting and saturating color are used for the drought. Data Faceting Another visualization technique discussed by Viegas is that of data faceting, which is basically adding two different visualizations together to understand and analyze the data better. A visualization example below shows what are the tax rates for different companies around the US, and how much does the tax amount vary among these companies. Each one of these circles is a company that is sized differently. The color here shows a distribution that goes from the lowest tax rate on the left to the highest on the right. “Just by looking at the distribution, you can tell that the tax rates are going up the further to the right they are. They have also calculated the tax rate for the entire distribution, so they are packing a ton of info in this graph,” says Viegas. Data Faceting Another tab saying “view by industry”, shows another visualization that presents the distribution of each industry, along with their tax rates and some commentary for each of the industries, starting from utilities to insurance.   Data Faceting Visualization uses in ML If you look at the visualization pipeline of machine learning, you can identify the areas and stages where visualization is particularly needed and helpful. “It’s thinking about through acquiring data, as you implement a model, training and when you deploy it for monitoring”, says Wattenberg. Visualization pipeline in ML Visualization is mainly used in Machine learning for training data, monitoring performance, improve interpretability, understand high-dimensional data, for education, and communication. Let’s now have a look at some of these. Visualizing training data To explain why visualizing training data can be useful, Viegas takes an example of visualizing CIFAR-10 which is a dataset that comprises a collection of images commonly used to train machine learning and computer vision algorithms. Viegas points out that there are a lot of tools for looking at your data. One such tool is Facets, an Open Source Visualization Tool for Machine Learning Training Data. In the example below, they have used facets where pictures in CIFAR 10 are organized into categories such as an airplane, automobile, bird, etc. CIFAR 10- Facets Demo Not only does it provide a clear distinction between different categories, but their Facets can also help with analyzing mistakes in your data. Facets provide a sense of the shape of each feature of the data using Facets Overview. You can also explore a set of individual observations using Facets Dive. These visualizations help with analyzing mistakes in your data and automatically provide an understanding of “distribution of values” across different features of a dataset. Visualizing Performance monitoring Viegas quickly went over how visualization is widely seen in performance monitoring in the form of monitor boards, almost on a daily basis, in machine learning. Performance monitoring visualization includes using different graphs and line charts, as while monitoring performance, you are constantly trying to make sure that your system is working right and doing what it’s supposed to do. Visualizing Interpretability Interpretability in machine learning means the degree to which a human can consistently predict the model’s result. Viegas discusses interpretability visualization in machine learning by breaking it further into visualization in CNNs, and RNNs. CNNs (Convolutional Neural Network) She compares interpretability of image classification to a petri dish. She explains how image classifiers are effective in practice, however, what they do and how they do it is mysterious, and they also have failures that add to the mystery. Another thing about image classifiers is that since they’re visual, it can be hard to understand what they exactly do such as what features do these networks really use, what roles are played by different layers, etc. An example presented by Viegas is of saliency maps that show each pixel's unique quality. Saliency maps simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. “The idea with saliency maps is to consider the sensitivity of class to each pixel. These can be sometimes deceiving, visually noisy,..and ..sometimes easy to project on them what you’re seeing”, adds Viegas.                                                   DrawNet Another example presented by Viegas that’s been very helpful in case of visualization of CNNs is that of drawNet by Antonio Torralba. The reason this visualization is particularly great is that it is great at informing people who are not from a machine learning field on how neural networks actually work. RNNs (Recurrent Neural Network) Viegas presented another visualization example in case of RNNs. A visualization example presented here is that of Karpathy, that looked at visualizing text sequences, and trying to understand that if you activate different cells, you can maybe interpret them. Visualizing text sequences The color scale is very friendly, and the fact that color layers right on top of the data. It is a good example of how to make the right tradeoff when selecting colors to represent quantitative data, explains Wattenberg. Viegas further pointed out how it’s always better to go back to the raw data (in this case, text), and show that to the user since it will make your visualization more effective. Visualizing High Dimensional Data Wattenberg explains how visualizing high dimensional data is very tough, and almost “impossible”. However, there are some approaches that help visualize it. These approaches are divided into two: linear and non-linear. Linear approaches include principal component analysis and visualization of labeled data using linear transformation. Non-linear approaches include multidimensional scaling, sammon mapping, t-SNE, UMAP, etc. Wattenberg gives an example of PCA on embedding projector that is using MNIST as a dataset. MNIST is a large database of handwritten digits commonly used for training different image processing systems. PCA does a good job at visualizing MNIST. However, using non-linear method is more effective since the clusters of digits get separated quite well. However, Wattenberg argues that there’s a lot of trickiness that goes around, and to analyze it, t-SNE is used to visualize data. t-SNE is a fairly complex non-linear technique that uses an adaptive sense of distance. It translates well between the geometry of high and low dimensional space. t-SNE is effective in visualizing high-dimensional data but there’s another method, called UMAP ( Uniform Manifold Approximation and Projection for Dimension Reduction), that is faster than t-SNE, and efficiently embed into high dimensions, and captures the global structure better. After learning how visualization is used in ML, and what different tools and methods work out of visualization in Machine learning, data scientists can now start experimenting and refining the existing visualization methods or they can even start inventing entirely new visual techniques. Now that you have a head-start, dive right into this fascinatingly informative tutorial on the NeurIPS page! NeurIPS 2018: Deep learning experts discuss how to build adversarially robust machine learning models NeurIPS 2018 paper: DeepMind researchers explore autoregressive discrete autoencoders (ADAs) to model music in raw audio at scale NIPS finally sheds its ‘sexist’ name for NeurIPS
Read more
  • 0
  • 0
  • 17659

article-image-ngi0-consortium-to-award-grants-worth-5-6-million-euro-to-open-internet-projects-that-promote-inclusivity-privacy-interoperability-and-data-protection
Bhagyashree R
03 Dec 2018
3 min read
Save for later

NGI0 Consortium to award grants worth 5.6 million euro to open internet projects that promote inclusivity, privacy, interoperability, and data protection

Bhagyashree R
03 Dec 2018
3 min read
NLnet Foundation, on Saturday, announced that they are now taking in submissions for project proposals for projects that will deliver “potential break-through contributions to the open internet”. These project will be judged on their technical merits, strategic relevance to the Next Generation Internet and overall value for money. For this, they have created separate themes such as NGI Zero PET and NGI Zero Discovery under which you can list your projects. The foundation will be investing 5.6 million euro on small to medium-size R&D grants towards improving search and discovery and privacy and trust enhancing technologies from 2018 to 2021. They are seeking project proposals between 5.000 and 50.000 euros, with the chances to scale them up if there is proven potential. Deadline for submitting these proposals is February 1st, 2019 12:00 CET. NLnet Foundation supports the open internet and the privacy and security of internet users. The foundation helps independent organizations and people that contribute to an open information society by providing them microgrants, advice, and access to a global network. Next Generation Internet (NGI): Creating open, trustworthy, and reliable internet for all The European Commission launched the NGI initiative in 2016 aiming to make the internet an interoperable platform ecosystem. This future internet will respect human and societal values such as openness, inclusivity, transparency, privacy, cooperation, and protection of data. NGI wants to make the internet more human-centric while also driving the adoption of advanced concepts and methodologies in domains such as artificial intelligence, Internet of Things, interactive technologies, and more. To achieve these goals NLnet has launched projects like NGI Zero Discovery and NGI Zero Privacy and Trust Enhancing Technologies (PET). NGI Zero Discovery aims to provide individual researchers and developers an agile, effective, and low-threshold funding mechanism. This project will help researchers and developers bring in new ideas that contribute to the establishment of the Next Generation Internet. These new projects will be made available as free/libre/open source software. NGI Zero PET is the sister project of NGI Zero Discovery. The objective of this project is to equip people with new technologies that will provide them better privacy. NLnet on their website said these investments are for helping researchers and developers towards creating an open internet: “Trust is one of the key drivers for the Next Generation Internet, and an adequate level of privacy is a non-negotiable requirement for that. We want to assist independent researchers and developers to create powerful new technology and to help them put it in the hands of future generations as building blocks for a fair and democratic society and an open economy that benefits all.” To read more, check out the NLnet Foundation’s official website. The State of Mozilla 2017 report focuses on internet health and user privacy Tim Berners-Lee plans to decentralize the web with ‘Solid’, an open-source project for “personal empowerment through data” Has the EU just ended the internet as we know it?
Read more
  • 0
  • 0
  • 14021

article-image-marriotts-starwood-guest-database-faces-a-massive-data-breach-affecting-500-million-user-data
Savia Lobo
03 Dec 2018
5 min read
Save for later

Marriott’s Starwood guest database faces a massive data breach affecting 500 million user data

Savia Lobo
03 Dec 2018
5 min read
Last week, a popular Hospitality company, Marriott International, unveiled details about a massive data breach, which exposed the personal and financial information of its customers. According to Marriott, this breach was happening over the past four years and collected information about customers who made reservations in its Starwood subsidiary. The information which was subject to the breach included details of approximately 500 million guests. For approximately 327 million of these guests, the information breached includes a combination of name, mailing address, phone number, email address, passport number, Starwood Preferred Guest (“SPG”) account information, date of birth, gender, arrival and departure information, reservation date, and communication preferences. The four-year-long breach that hit Marriott’s customer data Marriott, on September 8, 2018, received an alert from an internal security tool which reported that attempts had been taken to access the Starwood guest reservation database in the United States. Following this, Marriott carried out an investigation which revealed that their Starwood network had been accessed by attackers since 2014. According to Marriott’s news center, “On November 19, 2018, the investigation determined that there was unauthorized access to the database, which contained guest information relating to reservations at Starwood properties* on or before September 10, 2018.” For some users out of the 500 million, the information includes payment card details such as numbers and expiration dates. However,  “the payment card numbers were encrypted using Advanced Encryption Standard encryption (AES-128). There are two components needed to decrypt the payment card numbers, and at this point, Marriott has not been able to rule out the possibility that both were taken. For the remaining guests, the information was limited to name and sometimes other data such as mailing address, email address, or other information”, stated the Marriott News release. Arne Sorenson, Marriott’s President, and Chief Executive Officer said, “We will continue to support the efforts of law enforcement and to work with leading security experts to improve.  Finally, we are devoting the resources necessary to phase out Starwood systems and accelerate the ongoing security enhancements to our network”. Marriott also reported this incident to law enforcement and are notifying regulatory authorities. This is not the first time Starwood data was breached Marriott hoteliers did not exactly mention when the breach hit them four years ago in 2014. However, its subsidiary Starwood revealed that, a few days after being acquired by Marriott, more than 50 of Starwood’s properties were breached in November 2015. According to Starwood’s disclosure at the time, that earlier breach stretched back at least one year, i.e., November 2014. According to Krebs on Security, “Back in 2015, Starwood said the intrusion involved malicious software installed on cash registers at some of its resort restaurants, gift shops and other payment systems that were not part of its guest reservations or membership systems.” In Dec. 2016, KrebsOnSecurity stated, “banks were detecting a pattern of fraudulent transactions on credit cards that had one thing in common: They’d all been used during a short window of time at InterContinental Hotels Group (IHG) properties, including Holiday Inns and other popular chains across the United States.” Marriott said that its own network has not been affected by this four-year data breach and that the investigation only identified unauthorized access to the separate Starwood network. “Marriott is providing its affected guests in the United States, Canada, and the United Kingdom a free year’s worth of service from WebWatcher, one of several companies that advertise the ability to monitor the cybercrime underground for signs that the customer’s personal information is being traded or sold”, said Krebs on Security. What should compromised users do? Companies affected by the breach or as a defense measure pay threat hunters to look out for new intrusions. They can even test their own networks and employees for weaknesses, and arrange for a drill in order to combat their breach response preparedness. For individuals who re-use the same password should try using password managers, which helps remember strong passwords/passphrases and essentially lets you use the same strong master password/passphrase across all Web sites. According to a Krebs on Security’s “assume you’re compromised” philosophy “involves freezing your credit files with the major credit bureaus and regularly ordering free copies of your credit file from annualcreditreport.com to make sure nobody is monkeying with your credit (except you).” Rob Rosenberger, Co-founder of Vmyths, urged everyone who booked a room at their properties since 2014 by tweeting advice that the affected users should change their mother’s maiden name and the social security number soon. https://twitter.com/vmyths/status/1069273409652224000 To know more about the Marriott breach in detail, visit Marriott’s official website. Uber fined by British ICO and Dutch DPA for nearly $1.2m over a data breach from 2016 Dell reveals details on its recent security breach Twitter on the GDPR radar for refusing to provide a user his data due to ‘disproportionate effort’ involved
Read more
  • 0
  • 0
  • 24231
article-image-microsoft-becomes-the-worlds-most-valuable-public-company-moves-ahead-of-apple
Sugandha Lahoti
03 Dec 2018
3 min read
Save for later

Microsoft becomes the world's most valuable public company, moves ahead of Apple

Sugandha Lahoti
03 Dec 2018
3 min read
Last week, Microsoft moved ahead of Apple as the world’s most valuable publicly traded U.S. company. On Friday, the company closed on with a market value of $851 billion with Apple a few steps short at $847 billion. The move from Windows to Cloud Microsoft's success can be attributed to its able leadership under CEO Satya Nadella and his focus on moving away from the flagship Windows operating system and focusing on cloud-computing services with long-term business contracts. The organization's biggest growth has happened in its Azure Cloud platform. Cloud computing now accounts for more than a quarter of Microsoft’s revenue rivaling Amazon, which is also a leading provider. Microsoft is also building new products and features for Azure. Last month, it announced container support for Azure Cognitive Services to build intelligent applications. In October, it invested in Grab to together conquer the Southeast Asian on-demand services market with Azure’s Intelligent Cloud. In September, at the Ignite 2018, the company announced major changes and improvements to their cloud offering. It also came up with Azure Functions 2.0 with better workload support for serverless, general availability of Microsoft’s Immutable storage for Azure Storage Blobs, and Azure DevOps. In August, Microsoft made Azure supported for NVIDIA GPU Cloud (NGC), and a new governance DApp for Azure. Wedbush analyst Dan Ives commented that “Azure is still in its early days, meaning there’s plenty of room for growth, especially considering the company’s large customer base for Office and other products. While the tech carnage seen over the last month has been brutal, shares of (Microsoft) continue to hold up like the Rock of Gibraltar” he said. Focus on business and values Microsoft has also prioritized business-oriented services such as Office and other workplace software, as well as newer additions such as LinkedIn and Skype. In 2016, Microsoft bought LinkedIn, the social network for professionals, for $26.2 billion. This year, Microsoft paid $7.5 billion for GitHub, an open software platform used by 28 million programmers. Another reason Microsoft is flourishing is because of its focus on upholding its founding values without compromising on issues like internet censorship and surveillance. Daniel Morgan, senior portfolio manager for Synovus Trust, says “Microsoft is outperforming its tech rivals in part because it doesn’t face as much regulatory scrutiny as advertising-hungry Google and Facebook, which have attracted controversy over their data-harvesting practices. Unlike Netflix, it’s not on a hunt for a diminishing number of international subscribers. And while Amazon also has a strong cloud business, it’s still more dependent on online retail.” In a recent episode of Pivot with Kara Swisher and Scott Galloway, the two speakers also talked about why Microsoft is more valuable than Apple. Scott said that Microsoft’s success is because of Nadella’s decision of diversifying Microsoft’s business into enough verticals which is the reason why the company hasn’t been as impacted by tech stocks’ recent decline. He argues that Satya Nadella deserves the title of “tech CEO of the year”. Microsoft wins $480 million US Army contract for HoloLens. Microsoft amplifies focus on conversational AI: Acquires XOXCO; shares guide to developing responsible bots. Microsoft announces official support for Windows 10 to build 64-bit ARM apps
Read more
  • 0
  • 0
  • 14296

article-image-google-bypassed-its-own-security-and-privacy-teams-for-project-dragonfly-reveals-intercept
Sugandha Lahoti
30 Nov 2018
5 min read
Save for later

Google bypassed its own security and privacy teams for Project Dragonfly reveals Intercept

Sugandha Lahoti
30 Nov 2018
5 min read
Google’s Project Dragonfly has faced significant criticism and scrutiny from both the public and Google employees. In a major report yesterday, the Intercept revealed how internal conversations around Google’s censored search engine for China shut out Google’s legal, privacy, and security teams. According to named and anonymous senior Googlers who worked on the project and spoke to The Intercept's Ryan Gallagher, Company executives appeared intent on watering down the privacy review. Google bosses also worked to suppress employee criticism of the censored search engine. Project Dragonfly is the secretive search engine that Google is allegedly developing which will comply with the Chinese rules of censorship. It was kept secret from the company at large during the 18 months it was in development until an insider leak led to its existence being revealed in The Intercept. It has been on the receiving end of a constant backlash from various human rights organizations and investigative reporters, since then. Earlier this week, it also faced criticism from human rights organization Amnesty International and was followed by Google employees signing a petition protesting Google’s infamous Project Dragonfly. The secretive way Google operated Dragonfly Majority of the leaks were reported by Yonatan Zunger, a security engineer on the Dragonfly team. He was asked to produce the privacy review for the project in early 2017. However, he faced opposition from Scott Beaumont, Google’s top executive for China and Korea. According to Zunger, Beaumont “wanted the privacy review of Dragonfly]to be pro forma and thought it should defer entirely to his views of what the product ought to be. He did not feel that the security, privacy, and legal teams should be able to question his product decisions, and maintained an openly adversarial relationship with them — quite outside the Google norm.” Beaumont also micromanaged the project and ensured that discussions about Dragonfly and access to documents about it were under his tight control. If some members of the Dragonfly team broke the strict confidentiality rules, then their contracts at Google could be terminated. Privacy report by Zunger In midst of all these conditions, Zunger and his team were still able to produce a privacy report. The report mentioned problematic scenarios that could arise if the search engine was launched in China. The report mentioned that, in China, it would be difficult for Google to legally push back against government requests, refuse to build systems specifically for surveillance, or even notify people of how their data may be used. Zunger’s meetings with the company’s senior leadership on the discussion of the privacy report were repeatedly postponed. Zunger said, “When the meeting did finally take place, in late June 2017, I and my team were not notified, so we missed it and did not attend. This was a deliberate attempt to exclude us.” Dragonfly: Not just an experiment Intercept’s report even demolished Sundar Pichai’s recent public statement on Dragonfly, where he described it as “just an experiment,” adding that it remained unclear whether the company “would or could” eventually launch it in China. Google employees were surprised as they were told to prepare the search engine for launch between January and April 2019, or sooner. “What Pichai said [about Dragonfly being an experiment] was ultimately horse shit,” said one Google source with knowledge of the project. “This was run with 100 percent intention of launch from day one. He was just trying to walk back a delicate political situation.” It is also alleged that Beaumont had intended from day one that the project should only be known about once it had been launched. “He wanted to make sure there would be no opportunity for any internal or external resistance to Dragonfly.” said one Google source to Intercept. This makes us wonder the extent to which Google really is concerned about upholding its founding values, and how far it will go in advocating internet freedom, openness, and democracy. It now looks a lot like a company who simply prioritizes growth and expansion into new markets, even if it means compromising on issues like internet censorship and surveillance. Perhaps we shouldn’t be surprised. Google CEO Sundar Pichai is expected to testify in Congress on Dec. 5 to discuss transparency and bias. Members of Congress will likely also ask about Google's plans in China. Public opinion on Intercept’s report is largely supportive. https://twitter.com/DennGordon/status/1068228199149125634 https://twitter.com/mpjme/status/1068268991238541312 https://twitter.com/cynthiamw/status/1068240969990983680 Google employee and inclusion activist Liz Fong Jones tweeted that she would match $100,000 in pledged donations to a fund to support employees who refuse to work in protest. https://twitter.com/lizthegrey/status/1068212346236096513 She has also shown full support for Zunger https://twitter.com/lizthegrey/status/1068209548320747521 Google employees join hands with Amnesty International urging Google to drop Project Dragonfly OK Google, why are you ok with mut(at)ing your ethos for Project DragonFly? Amnesty International takes on Google over Chinese censored search engine, Project Dragonfly.
Read more
  • 0
  • 0
  • 38721

article-image-getting-started-with-web-scraping-using-python-tutorial
Melisha Dsouza
29 Nov 2018
15 min read
Save for later

Getting started with Web Scraping using Python [Tutorial]

Melisha Dsouza
29 Nov 2018
15 min read
Small manual tasks like scanning through information sources in search of small bits of relevant information are in fact, automatable.  Instead of performing tasks that get repeated over and over, we can use computers to do these kinds of menial tasks and focus our own efforts instead on what humans are good for—high-level analysis and decision making based on the result. This tutorial shows how to use the Python language to automatize common business tasks that can be greatly sped up if a computer is doing them. The code files for this article are available on Github. This tutorial is an excerpt from a book written by Jaime Buelta titled Python Automation Cookbook. The internet and the WWW (World Wide Web) is the most prominent source of information today.   In this article, we will learn to perform operations programmatically to automatically retrieve and process information. Python  requests module makes it very easy to perform these operations. We'll cover the following recipes: Downloading web pages Parsing HTML Crawling the web Accessing password-protected pages Speeding up web scraping Downloading web pages The basic ability to download a web page involves making an HTTP GET request against a URL. This is the basic operation of any web browser.  We'll see in this recipe how to make a simple request to obtain a web page. Install requests module: $ echo "requests==2.18.3" >> requirements.txt $ source .venv/bin/activate (.venv) $ pip install -r requirements.txt Download the example page because it is a straightforward HTML page that is easy to read in text mode. How to Download web pages Import the requests module: >>> import requests Make a request to the URL, which will take a second or two: >>> url = 'http://www.columbia.edu/~fdc/sample.html' >>> response = requests.get(url) Check the returned object status code: >>> response.status_code 200 Check the content of the result: >>> response.text '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">\n<html>\n<head>\n ... FULL BODY ... <!-- close the <html> begun above -->\n' Check the ongoing and returned headers: >>> response.request.headers {'User-Agent': 'python-requests/2.18.4', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'} >>> response.headers {'Date': 'Fri, 25 May 2018 21:51:47 GMT', 'Server': 'Apache', 'Last-Modified': 'Thu, 22 Apr 2004 15:52:25 GMT', 'Accept-Ranges': 'bytes', 'Vary': 'Accept-Encoding,User-Agent', 'Content-Encoding': 'gzip', 'Content-Length': '8664', 'Keep-Alive': 'timeout=15, max=85', 'Connection': 'Keep-Alive', 'Content-Type': 'text/html', 'Set-Cookie': 'BIGipServer~CUIT~www.columbia.edu-80-pool=1764244352.20480.0000; expires=Sat, 26-May-2018 03:51:47 GMT; path=/; Httponly'} The operation of requests is very simple; perform the operation, GET in this case, over the URL. This returns a result object that can be analyzed. The main elements are the status_code and the body content, which can be presented as text. The full request can be checked in the request field: >>> response.request <PreparedRequest [GET]> >>> response.request.url 'http://www.columbia.edu/~fdc/sample.html' You can check out the full request's documentation for more information. Parsing HTML We'll use the excellent Beautiful Soup module to parse the HTML text into a memory object that can be analyzed. We need to use the beautifulsoup4 package to use the latest Python 3 version that is available. Add the package to your requirements.txt and install the dependencies in the virtual environment: $ echo "beautifulsoup4==4.6.0" >> requirements.txt $ pip install -r requirements.txt How to perform HTML Parsing Import BeautifulSoup and requests: >>> import requests >>> from bs4 import BeautifulSoup Set up the URL of the page to download and retrieve it: >>> URL = 'http://www.columbia.edu/~fdc/sample.html' >>> response = requests.get(URL) >>> response <Response [200]> Parse the downloaded page: >>> page = BeautifulSoup(response.text, 'html.parser') Obtain the title of the page. See that it is the same as what's displayed in the browser: >>> page.title <title>Sample Web Page</title> >>> page.title.string 'Sample Web Page' Find all the h3 elements in the page, to determine the existing sections: >>> page.find_all('h3') [<h3><a name="contents">CONTENTS</a></h3>, <h3><a name="basics">1. Creating a Web Page</a></h3>, <h3><a name="syntax">2. HTML Syntax</a></h3>, <h3><a name="chars">3. Special Characters</a></h3>, <h3><a name="convert">4. Converting Plain Text to HTML</a></h3>, <h3><a name="effects">5. Effects</a></h3>, <h3><a name="lists">6. Lists</a></h3>, <h3><a name="links">7. Links</a></h3>, <h3><a name="tables">8. Tables</a></h3>, <h3><a name="install">9. Installing Your Web Page on the Internet</a></h3>, <h3><a name="more">10. Where to go from here</a></h3>] 6. Extract the text on the section links. Stop when you reach the next <h3> tag: >>> link_section = page.find('a', attrs={'name': 'links'}) >>> section = [] >>> for element in link_section.next_elements: ... if element.name == 'h3': ... break ... section.append(element.string or '') ... >>> result = ''.join(section) >>> result '7. Links\n\nLinks can be internal within a Web page (like to\nthe Table of ContentsTable of Contents at the top), or they\ncan be to external web pages or pictures on the same website, or they\ncan be to websites, pages, or pictures anywhere else in the world.\n\n\n\nHere is a link to the Kermit\nProject home pageKermit\nProject home page.\n\n\n\nHere is a link to Section 5Section 5 of this document.\n\n\n\nHere is a link to\nSection 4.0Section 4.0\nof the C-Kermit\nfor Unix Installation InstructionsC-Kermit\nfor Unix Installation Instructions.\n\n\n\nHere is a link to a picture:\nCLICK HERECLICK HERE to see it.\n\n\n' Notice that there are no HTML tags; it's all raw text. The first step is to download the page. Then, the raw text can be parsed, as in step 3. The resulting page object contains the parsed information. BeautifulSoup allows us to search for HTML elements. It can search for the first one with .find() or return a list with .find_all(). In step 5, it searched for a specific tag <a> that had a particular attribute, name=link. After that, it kept iterating on .next_elements until it finds the next h3 tag, which marks the end of the section. The text of each element is extracted and finally composed into a single text. Note the or that avoids storing None, returned when an element has no text. Crawling the web Given the nature of hyperlink pages, starting from a known place and following links to other pages is a very important tool in the arsenal when scraping the web. To do so, we crawl a page looking for a small phrase and will print any paragraph that contains it. We will search only in pages that belong to the same site. I.e. only URLs starting with www.somesite.com. We won't follow links to external sites. We'll use as an example a prepared example, available in the GitHub repo. Download the whole site and run the included script. $ python simple_delay_server.py This serves the site in the URL http://localhost:8000. You can check it on a browser. It's a simple blog with three entries. Most of it is uninteresting, but we added a couple of paragraphs that contain the keyword python. How to crawl the web The full script, crawling_web_step1.py, is available in GitHub. The most relevant bits are displayed here: ... def process_link(source_link, text): logging.info(f'Extracting links from {source_link}') parsed_source = urlparse(source_link) result = requests.get(source_link) # Error handling. See GitHub for details ... page = BeautifulSoup(result.text, 'html.parser') search_text(source_link, page, text) return get_links(parsed_source, page) def get_links(parsed_source, page): '''Retrieve the links on the page''' links = [] for element in page.find_all('a'): link = element.get('href') # Validate is a valid link. See GitHub for details ... links.append(link) return links   Search for references to python, to return a list with URLs that contain it and the paragraph. Notice there are a couple of errors because of broken links: $ python crawling_web_step1.py https://localhost:8000/ -p python Link http://localhost:8000/: --> A smaller article , that contains a reference to Python Link http://localhost:8000/files/5eabef23f63024c20389c34b94dee593-1.html: --> A smaller article , that contains a reference to Python Link http://localhost:8000/files/33714fc865e02aeda2dabb9a42a787b2-0.html: --> This is the actual bit with a python reference that we are interested in. Link http://localhost:8000/files/archive-september-2018.html: --> A smaller article , that contains a reference to Python Link http://localhost:8000/index.html: --> A smaller article , that contains a reference to Python Another good search term is crocodile. Try it out: $ python crawling_web_step1.py http://localhost:8000/ -p crocodile Let's see each of the components of the script: A loop that goes through all the found links, in the main function: Downloading and parsing the link, in the process_link function: It downloads the file, and checks that the status is correct to skip errors such as broken links. It also checks that the type (as described in Content-Type) is a HTML page to skip PDFs and other formats. And finally, it parses the raw HTML into a BeautifulSoup object. It also parses the source link using urlparse, so later, in step 4, it can skip all the references to external sources. urlparse divides a URL into its composing elements: >>> from urllib.parse import urlparse >>> >>> urlparse('http://localhost:8000/files/b93bec5d9681df87e6e8d5703ed7cd81-2.html') ParseResult(scheme='http', netloc='localhost:8000', path='/files/b93bec5d9681df87e6e8d5703ed7cd81-2.html', params='', query='', fragment='') It finds the text to search, in the search_text function: It searches the parsed object for the specified text. Note the search is done as a regex and only in the text. It prints the resulting matches, including source_link, referencing the URL where the match was found: for element in page.find_all(text=re.compile(text)): print(f'Link {source_link}: --> {element}') The  get_links function retrieves all links on a page: It searches in the parsed page all <a> elements, and retrieves the href elements, but only elements that have such href elements and that are a fully qualified URL (starting with http). This removes links that are not a URL, such as a '#' link, or that are internal to the page. An extra check is done to check they have the same source as the original link, then they are registered as valid links. The netloc attribute allows to detect that the link comes from the same URL domain than the parsed URL generated in step 2. Finally, the links are returned, where they'll be added to the loop described in step 1. Accessing password-protected pages Sometimes a web page is not open to the public but protected in some way. The most basic aspect is to use basic HTTP authentication, which is integrated into virtually every web server, and it's a user/password schema. We can test this kind of authentication in https://httpbin.org. It has a path, /basic-auth/{user}/{password}, which forces authentication, with the user and password stated. This is very handy for understanding how authentication works. How to Access password protected pages Import requests: >>> import requests Make a GET request to the URL with the wrong credentials. Notice that we set the credentials on the URL to be user and psswd: >>> requests.get('https://httpbin.org/basic-auth/user/psswd', auth=('user', 'psswd')) <Response [200]> Use the wrong credentials to return a 401 status code (Unauthorized): >>> requests.get('https://httpbin.org/basic-auth/user/psswd', auth=('user', 'wrong')) <Response [401]> The credentials can be also passed directly in the URL, separated by a colon and an @ symbol before the server, like this: >>> requests.get('https://user:psswd@httpbin.org/basic-auth/user/psswd') <Response [200]> >>> requests.get('https://user:wrong@httpbin.org/basic-auth/user/psswd') <Response [401]> Speeding up web scraping Most of the time spent downloading information from web pages is usually spent waiting. A request goes from our computer to whatever server will process it, and until the response is composed and comes back to our computer, we cannot do much about it. During the execution of the recipes in the book, you'll notice there's a wait involved in requests calls, normally of around one or two seconds. But computers can do other stuff while waiting, including making more requests at the same time. In this recipe, we will see how to download a list of pages in parallel and wait until they are all ready. We will use an intentionally slow server to show the point. We'll get the code to crawl and search for keywords, making use of the futures capabilities of Python 3 to download multiple pages at the same time. A future is an object that represents the promise of a value. This means that you immediately receive an object while the code is being executed in the background. Only, when specifically requesting for its .result() the code blocks until getting it. To generate a future, you need a background engine, called executor. Once created, submit a function and parameters to it to retrieve a future.  The retrieval of the result can be delayed as long as necessary, allowing the generation of several futures in a row, and waiting until all are finished, executing them in parallel, instead of creating one, wait until it finishes, creating another, and so on. There are several ways to create an executor; in this recipe, we'll use ThreadPoolExecutor, which will use threads. We'll use as an example a prepared example, available in the GitHub repo. Download the whole site and run the included script $ python simple_delay_server.py -d 2 This serves the site in the URL http://localhost:8000. You can check it on a browser. It's s simple blog with three entries. Most of it is uninteresting, but we added a couple of paragraphs that contain the keyword python. The parameter -d 2 makes the server intentionally slow, simulating a bad connection. How to speed up web scraping Write the following script, speed_up_step1.py. The full code is available in GitHub. Notice the differences in the main function. Also, there's an extra parameter added (number of concurrent workers), and the function process_link now returns the source link. Run the crawling_web_step1.py script to get a time baseline. Notice the output has been removed here for clarity: $ time python crawling_web_step1.py http://localhost:8000/ ... REMOVED OUTPUT real 0m12.221s user 0m0.160s sys 0m0.034s Run the new script with one worker, which is slower than the original one: $ time python speed_up_step1.py -w 1 ... REMOVED OUTPUT real 0m16.403s user 0m0.181s sys 0m0.068s Increase the number of workers: $ time python speed_up_step1.py -w 2 ... REMOVED OUTPUT real 0m10.353s user 0m0.199s sys 0m0.068s Adding more workers decreases the time: $ time python speed_up_step1.py -w 5 ... REMOVED OUTPUT real 0m6.234s user 0m0.171s sys 0m0.040s The main engine to create the concurrent requests is the main function. Notice that the rest of the code is basically untouched (other than returning the source link in the process_link function). This is the relevant part of the code that handles the concurrent engine: with concurrent.futures.ThreadPoolExecutor(max_workers=workers) as executor: while to_check: futures = [executor.submit(process_link, url, to_search) for url in to_check] to_check = [] for data in concurrent.futures.as_completed(futures): link, new_links = data.result() checked_links.add(link) for link in new_links: if link not in checked_links and link not in to_check: to_check.append(link) max_checks -= 1 if not max_checks: return   The with context creates a pool of workers, specifying its number. Inside, a list of futures containing all the URLs to retrieve is created. The .as_completed() function returns the futures that are finished, and then there's some work dealing with obtaining newly found links and checking whether they need to be added to be retrieved or not. This process is similar to the one presented in the Crawling the web recipe. The process starts again until enough links have been retrieved or there are no links to retrieve. In this post, we learned to use the power of Python to automate web scraping tasks. To understand how to automate monotonous tasks with Python 3.7, check out our book: Python Automation Cookbook Google releases Magenta studio beta, an open source python machine learning library for music artists How to perform sentiment analysis using Python [Tutorial] Do you write Python Code or Pythonic Code?
Read more
  • 0
  • 0
  • 62599
article-image-is-anti-trust-regulation-coming-to-facebook-following-fake-news-inquiry-made-by-a-global-panel-in-the-house-of-commons-uk
Prasad Ramesh
28 Nov 2018
11 min read
Save for later

Is Anti-trust regulation coming to Facebook following fake news inquiry made by a global panel in the House of Commons, UK?

Prasad Ramesh
28 Nov 2018
11 min read
The DCMS meeting for fake news inquiry on Facebook’s platform was held yesterday at the House of Commons, UK. This was the first time that parliamentarians from such number (nine) of countries gathered in one place. Representatives were from Argentina, Canada, France, Singapore, Ireland, Belgium, Brazil, Latvia, and a select few from the DCMS committee. Richard Allan is also a member of House of Lords, as Lord Allan of Hallam in addition to being Facebook VP for policy solutions. The committee was chaired by Damian Collins MP, head of UK parliament’s digital, culture, media and sport (DCMS) select committee. About Zuckerberg not attending the meeting himself Facebook had refused to send Zuckerberg to the hearing despite repeated requests from the DCMS committee and even after being flexible about remotely attending the meeting via FaceTime. The parliamentarians were clearly displeased with Mark Zuckerberg’s empty chair at the meeting. They made remarks about how he should be accountable as a CEO for a meeting that involves his company and representatives representing millions of Facebook users from different countries. There were plenty of remarks directed at the Facebook founder being absent in the hearing. Statement from Mark Zuckerberg to the US Senate hearing earlier this year: “We didn’t take a broad enough view of our responsibility, it was my mistake and I am sorry”. Allan was asked if he thought that was a genuine statement, he said yes. Then Nathaniel Erskine-Smith from Canada made a remark “Just not sorry enough to appear himself before nine parliaments.” Canada wasn’t done, another remark from Erksine-Smith: “Sense of corporate social responsibility, particularly in light of the immense power and profit of Facebook, has been empty as the chair beside you.” In Canada, only 270 people had used the app called Your Digital Life related to Cambridge Analytica and 620,000 had their information shared with the developer. Who gave Mr. Zuckerberg the advice to ignore this committee? Charles Angus Vice-Chair, from House of Commons, Canada made a remark that Zuckerberg decided to “blow off this meeting”. Richard Allan accepted full responsibility for decisions on public appearances for Facebook. How does it looks that Zuckerberg is not here and you’re apologizing for his absence? “Not great” was his answer. Don’t you see that Facebook has lost public trust due to misinformation tactics? Allan agreed to this point. Charles Angus said Facebook has lost the trust of the international committee that it can police itself. Damian Collins said, “It should be up to the parliaments to decide what regulatory measures need to be set in place and not Facebook.“ Were you sent because you could answer our questions or to defend Facebook’s position? Allan said that he was sent to answer questions and was in the company since 2009 and had experienced events first hand. He said that he volunteered to come, Mike Schroepfer, Facebook CTO was sent to an earlier hearing, but the committee was not happy with his answers. The Cambridge Analytica incident Questions were asked about when Facebook became aware of this incident. Allan said that it was when the story was out in the press. When did Mark Zuckerberg know about the GSR Cambridge Analytica incident? After some evasion, the answer was March 2018, as the timeline, when it was covered by the press. The same question was asked 6 months ago to Mike Schroepfer and he said he didn’t know. A follow up was if Facebook was aware of and banned any other apps that breached privacy. Allan said that there were many but on probing could not name even one. He promised to send the committee a written response to that question. After the US senate hearing in April, Zuckerberg was supposed to give a list of such apps that were banned, the committee still hasn’t got any such list. Ian Lucas MP (Wrexham, Labour) said: “You knew app developers were sharing information and the only time you took actions was when you were found out.” What were Facebook’s decision on its decisions on data and privacy controls that led to the Cambridge Analytica scandal? Allan explained that there are two versions of the way developers had access to user data: Before the 2015 policy changes, access to friends data was allowed After the changes, this access was removed Non user data is sitting on Facebook servers but they do not use it to create shadow profiles. Additionally, any third party apps are expected to have their own privacy policy which can be different from Facebook’s own privacy policy. Allan said that if any such app is found that has privacy measures that may lead to privacy issues, then they take actions but could not provide an example of having done so. Will Facebook apply GDPR standards across all countries as Zuckerberg stated? They believe that the tools, system that they built are GDPR complaint. Russian activity on Facebook From the recently seized documents, questions were asked but Allan deflected them by saying they are unverified and partial information. Why didn’t Facebook disclose that it knew Russian ads were run on its platform? The case made for the question was that no one from Facebook disclosed that information about Russian activity on its platform. It wasn’t disclosed until US Senate intelligence committee made a formal request. Allan said that their current policy at the moment is to investigate and publish any such activity. From the cache of documents obtained, a point was made about an email by a Facebook engineer in 2014 about Russian IP addresses using a Pinterest API key to pull over 3 billion data points through the ordered friends API. Allan said that the details from those seized mails/documents are unverified, partial and can be misleading. Allan stuck to his guns saying: “we will come back to you”. Facebook’s privacy controls Facebook user settings were overridden by a checkbox not in plain sight and that allowed your friends' apps to access your data. When did Facebook change the API that overrode its own central privacy page? In November 2009, Facebook central privacy page that allowed you to “control who can see your profile and personal information”. In November 2011 the US federal trade commission made a complaint against Facebook that they allowed external app developers to access personal information. Allan responded by saying that in privacy settings, there was a checkbox to disallow access to your data by applications installed by friends. What about non-Facebook user data? They use it to link connections when that person becomes a Facebook user. They do not make any money out of it. What’s your beef with Six4Three? Their app, pikini depended on friends data. When Facebook changed the API to version 2 as mentioned above, Six4Three sued Facebook as their app won’t work anymore. Discussions on can Facebook be held accountable for its actions Allan agrees that there should be a global regulation to hold the company accountable for its actions. Are you serious about regulation on a global level? There are now tens of thousands of Facebook employees working towards securing user privacy, they were too few before. Allan agrees a global level regulation should be present, the company should be accountable for its actions and sanctions should be made against any undesirable actions by the company. Maybe this will be communicated from a global organization like the United Nations (UN). How is Facebook policing fake accounts and their networks? It is an ongoing battle. Most of fake account creation is not with political intent but with commercial intent to sell followers pushing spam etc. More clicks = more money for them. Many of these accounts are taken out within minutes of creation. “We have artificial intelligence systems that try and understand what a fake account looks like and shut them down as soon as they come up”. This is for mass creation of accounts. In political cases only or two accounts are created and act as genuine users. Facebook is still removing fake accounts related to Russia. Allan says they’re trying to get better all the time. Low-quality information has decreased by 50% on Facebook, as per research by academia. Fake users that use VPN are difficult to address. For running political ads, an account needs to be a regularly used account, driving license or passport is needed and also the payment information is stored with Facebook in addition to any information that Facebook may have. Allan says, in this case, it would be unwise since the information can be used to prosecute the fake account user even if the documents used were fake. In the context of fake ads or information Allan agreed that the judicial authority of the specific country is the best entrusted with taking down sensitive information. He gave an example—if someone claims that a politician is corrupt and he is not, taking it down is correct but if he is corrupt and it is taken down then genuine information is lost. A case of non-regulation was pointed out A hate speech Facebook comment in Sri Lanka was pointed out by Edwin Tong of Singapore. The comment was in Sinhalese and Facebook did not remove it even after reports of it being hate speech. Allan said that it was a mistake and they are heavily investing in artificial intelligence with a set of hate speech keywords that can weed out such comments. They are working through the different languages on this. How will Facebook provide transparency on the use of measures taken against fake news? There is a big push around academic study in this area. They are working with academics in this area to understand the fake news problem better. But they also want to ensure that Facebook doesn't end up sharing user data that people would find inappropriate. How is Facebook monitoring new sign-ups and posts during elections? There should not be anonymous users. The next time they log in, there is a checkpoint that says more information is required. Would Facebook consider working actively with local election authorities to remove or flag posts that would influence voter choice? They think that this is essential. It is the judiciary system that can best decide if such posts are true or false and make a call to remove them. To make everyone feel that the election was free and fair is something Facebook can’t do on their own. What is Facebook doing to prevent misuse of its algorithms to influence elections? Change in algorithms the way it searches information generally. They now better classify low-quality content. Secondly, there is a category of borderline content which is not banned by close to being banned. But it is getting rewarded by the algorithm, work is being done to reduce it instead. Third party opinion as fact checkers for marking posts as false or true. This is about tilting the scales to higher quality less sensational content from lower quality more sensational content in the algorithm. What measures for fake news in WhatsApp? There are services that provide fake WhatsApp numbers. Report it to the company and they will take them down, says Allan. They are aware of this and its use and it needs to be a part of the election protection effort. Closing discussion After the lengthy round of grilling of the fake news inquiry, Angus reiterated that they expect Facebook to be accountable for its actions. Would you be interested in asking your friend Mr Zuckerberg if we should have a discussion about anti-trust? You and Mr. Zuckerberg are the symptoms. Perhaps the best regulation is anti trust, to break Facebook up from WhatsApp and Instagram, allow competition. Allan answers that it depends on the problem to solve. Angus jolted: “The problem is Facebook” which we need to address. It’s unprecedented economic control of every form of social discourse and communication. Angus asks Facebook to have corporate accountability. Perhaps in its unwillingness to be accountable to the international body, maybe anti-trust would be something to help get credible democratic responses from a corporation. These were of the highlights of the questions and answers at the committee meeting held on 27th November 2018, the House of Commons. We recommend you watch the complete proceeding for a more comprehensive context here. In our view, Mr Allan tried answering many of the questions during the three hour session of this fake news inquiry better than Sandberg or Zuckerberg did in their hearings, but the answers were less than satisfactory where important topics were involved regarding Facebook’s data and privacy controls. It does appear that Facebook will continue to delay, deny and deflect as much as it can. Privacy experts urge the Senate Commerce Committee for a strong federal privacy bill “that sets a floor, not a ceiling” Consumer protection organizations submit a new data protection framework to the Senate Commerce Committee Facebook, Twitter open up at Senate Intelligence hearing, committee does ‘homework’ this time
Read more
  • 0
  • 0
  • 17887

article-image-malicious-code-in-npm-event-stream-package-targets-a-bitcoin-wallet-and-causes-8-million-downloads-in-two-months
Savia Lobo
28 Nov 2018
3 min read
Save for later

Malicious code in npm ‘event-stream' package targets a bitcoin wallet and causes 8 million downloads in two months

Savia Lobo
28 Nov 2018
3 min read
Last week Ayrton Sparling, a Computer Science major at CSUF, California disclosed that the popular npm package, event-stream, contains a malicious package named flatmap-stream. He disclosed the issue via the GitHub issue on the EventStream’s repository. The event-stream npm package was originally created and maintained by Dominic Tarr. However, this popular package has not been updated for a long time now. According to Thomas Hunter’s post on Medium, “Ownership of event-stream, was transferred by the original author to a malicious user, right9ctrl.  The malicious user was able to gain the trust of the original author by making a series of meaningful contributions to the package.” The malicious owner then added a malicious library named flatmap-stream to the events-stream package as a dependency. This led to a download and invocation of the event-stream package (using the malicious 3.3.6 version) by every user. The malicious library download added up to nearly 8 million downloads since it was included in September 2018. The malicious package represents a highly targeted attack and affects an open source app called bitpay/copay. Copay is a secure bitcoin wallet platform for both desktop and mobile devices. “We know the malicious package specifically targets that application because the obfuscated code reads the description field from a project’s package.json file, then uses that description to decode an AES256 encrypted payload”, said Thomas in his post. Post this breakout, many users from Twitter and GitHub have positively supported Dominic. In a statement on the event-stream issue, Dominic stated, “I've shared publish rights with other people before. Of course, If I had realized they had a malicious intent I wouldn't have, but at the time it looked like someone who was actually trying to help me”. https://twitter.com/dominictarr/status/1067186943304159233 As a support to Dominic, André Staltz, an open source hacker, tweeted, https://twitter.com/andrestaltz/status/1067157915398746114 Users affected by this malicious code are advised to eliminate this package from their application by reverting back to version 3.3.4 of event-stream. If the user application deals with Bitcoin, they should inspect its activity in the last 3 months to see if any mined or transferred bitcoins did not make it into their wallet. However, if the application does not deal with bitcoin but is especially sensitive, an inspection of its activity in the last 3 months for any suspicious activity is recommended. This is to analyze the notably data sent on the network to unintended destinations. To know more about this in detail, visit Eventstream’s repository. A new data breach on Facebook due to malicious browser extensions allowed almost 81,000 users’ private data up for sale, reports BBC News Wireshark for analyzing issues and malicious emails in POP, IMAP, and SMTP [Tutorial] Machine learning based Email-sec-360°surpasses 60 antivirus engines in detecting malicious emails
Read more
  • 0
  • 0
  • 23736
Modal Close icon
Modal Close icon