Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7007 Articles
article-image-techwontbuildit-entropic-maintainer-calls-for-a-ban-on-palantir-employees-contributing-to-the-project-and-asks-other-open-source-communities-to-take-a-stand-on-ethical-grounds
Sugandha Lahoti
19 Jul 2019
6 min read
Save for later

#TechWontBuildIt: Entropic maintainer calls for a ban on Palantir employees contributing to the project and asks other open source communities to take a stand on ethical grounds

Sugandha Lahoti
19 Jul 2019
6 min read
The tech industry is being plagued by moral and ethical issues as top players are increasingly becoming explicit about prioritizing profits over people or planet. Recent times are rift with cases of tech companies actively selling facial recognition technology to law enforcement agencies, helping ICE separate immigrant families, taking large contracts with the Department of Defense, accelerating the extraction of fossil fuels, deployment of surveillance technology. As the US gets alarmingly dangerous for minority groups, asylum seekers and other vulnerable communities, it has awakened the tech worker community to organize for keeping their employers in check. They have been grouping together to push back against ethically questionable decisions made by their employers using the hashtag #TechWontBuildIt since 2018. Most recently, several open source communities, activists and developers have strongly demonstrated against Palantir for their involvement with ICE. Palantir, a data analytics company, founded by Peter Thiel, one of President Trump’s most vocal supporters in Silicon Valley, has been called out for its association with the Immigration and Customs Enforcement (ICE). According to emails obtained by WNYC, Palantir’s mobile app FALCON is being used by ICE to carry out raids on immigrant communities as well as enable workplace raids. According to the emails, an ICE supervisor sent an email to his officers before a planned spate of raids in New York City in 2017. The emails ordered them to use a Palantir program, called FALCON mobile, for the operation. The email was sent in preparation for a worksite enforcement briefing on January 8, 2018. Two days later, ICE raided nearly a hundred 7-Elevens across U.S. According to WNYC, ICE workplace raids led to 1,525 arrests over immigration status from October 2017 to October 2018. The email reads, “[REDACTION] we want all the team leaders to utilize the FALCON mobile app on your GOV iPhones, We will be using the FALCON mobile app to share info with the command center about the subjects encountered in the stores as well as team locations." Other emails obtained by WYNC detail a Palantir staffer notifying an ICE agent to test out their FALCON mobile application because of his or her “possible involvement in an upcoming operation.” Another message, in April 2017, shows a Palantir support representative instructing an agent on how to classify a datapoint, so that Palantir’s Investigative Case Management [ICM] platform could properly ingest records of a cell phone seizure. In December 2018, Palantir told the New York Times‘ Dealbook that Palantir technology is not used by the division of ICE responsible for carrying out the deportation and detention of undocumented immigrants. Palantir declined WNYC’s requests for comment. Citing law enforcement “sensitivities,” ICE also declined to comment on how it uses Palantir during worksite enforcement operations. In May this year, new documents released by Mijente, an advocacy organization, revealed that Palantir was responsible for 2017 operation that targeted and arrested family members of children crossing the border alone. The documents show a huge contrast to what Palantir said its software was doing. As part of the operation, ICE arrested 443 people solely for being undocumented. Mijente has then urged Palantir to drop its contract with ICE and stop providing software to agencies that aid in tracking, detaining, and deporting migrants, refugees, and asylum seekers. Open source communities, activists and developers strongly oppose Palantir Post the revelation of Palantir’s involvement with ICE, several open-source developers are strongly opposing Palantir. The Entropic project, a JS package registry, is debating the idea of banning Palantir employees from participating in the project. Kat Marchán, Entropic maintainer posted on the forum, “I find it unconscionable for tech folks to be building the technological foundations for this deeply unethical and immoral (and fascist) practice, and I would like it if we, in our limited power as a community to actually affect the situation, officially banned any Palantir employees from participating in or receiving any sort of direct support from the Entropic community.” She has further proposed explicitly banning Palantir employees from the Discourse, the Discord, as well as the GitHub communities and any other forums, Entropic may use for coordinating the project. https://twitter.com/maybekatz/status/1151355320314187776 Amazon is also facing renewed calls from employees and external immigration advocates to stop working with Palantir. According to an internal email obtained by Forbes, Amazon employees are recirculating a June 2018 letter to executives calling for Palantir to be kicked off Amazon Web Services. More than 500 Amazon employees have signed the letter addressed to CEO Jeff Bezos and AWS head Andy Jassy. Not just that, pro-immigration organizations such as Mijente and Jews for Racial and Economic Justice, interrupted the keynote speech at Amazon’s annual AWS Summit, last Thursday. https://twitter.com/altochulo/status/1149326296092164097 More than a dozen groups of activists also protested on July 12 against Palantir Technologies in Palo Alto for the company’s provision of software facilitating ICE raids, detentions, and deportations. City residents also joined the protests expanding the total to hundreds. Back in August 2018, the Lerna team had taken a strong stand against ICE by modifying their MIT license to ban companies who have collaborated with ICE from using Lerna. The updated license banned companies that are known collaborators with ICE such as Microsoft, Palantir, and Amazon, among the others from using Lerna. To quote Meredith Whittaker, Google walkout organizer who recently left the company, from her farewell letter, “Tech workers have emerged as a force capable of making real change, pushing for public accountability, oversight, and meaningful equity. And this right when the world needs it most” She further adds, “The stakes are extremely high. The use of AI for social control and oppression is already emerging, even in the face of developers’ best of intentions. We have a short window in which to act, to build in real guardrails for these systems before AI is built into our infrastructure and it’s too late.” Extraordinary times call for extraordinary measures. As the tech industry grapples with the consequences of its hypergrowth technosolutionist mindset, where do tech workers draw the line? Can tech workers afford to be apolitical or separate their values from the work they do? There are no simple answers, but one thing is for sure - the questions must be asked and faced. Open source, as part of the commons, has a key role to play and how it evolves in the next couple of years is likely to define the direction the world would take. Lerna relicenses to ban major tech giants like Amazon, Microsoft, Palantir from using its software as a protest against ICE Palantir’s software was used to separate families in a 2017 operation reveals Mijente ACLU files lawsuit against 11 federal criminal and immigration enforcement agencies for disclosure of information on government hacking.
Read more
  • 0
  • 0
  • 19249

article-image-how-to-create-sales-analysis-app-in-qlik-sense-using-dar-method-tutorial
Savia Lobo
19 Jul 2019
14 min read
Save for later

How to create sales analysis app in Qlik Sense using DAR method [Tutorial]

Savia Lobo
19 Jul 2019
14 min read
A Qlik Sense application combines data in the form of a structured data model, visualizations, sheets, and stories. Dimensions, measures, and visualizations can be created as data items that can be reused between several visualizations (charts) and sheets (visualizations). This article is taken from the book Hands-On Business Intelligence with Qlik Sense by Kaushik Solanki, Pablo Labbe, Clever Anjos, and Jerry DiMaso. By the end of this book, you will be well-equipped to run successful business intelligence applications using Qlik Sense's functionality, data modeling techniques, and visualization best practices. To follow along with the examples implemented in this article, you can download the code from the book’s GitHub repository. In this article, we will create a sales analysis application to explore and analyze the data model that you can find on GitHub. While developing the application, we will apply the Dashboard, Analysis, Reporting (DAR) methodology. Technical requirements We will use the application we previously created, as a starting point, with a loaded data model to eliminate the process of loading and modeling the data all over again. You can also download the initial and final versions of the application from the book repository on GitHub. After downloading the initial version of the application, follow these steps: If you are using Qlik Sense Desktop, place the QVF application file in the QlikSense\Apps folder, under your document's personal folder. If you are using Qlik Sense Cloud, upload the application to your personal workspace. The problem statement Suppose that the company is a worldwide wholesale seller of food and beverages. This means that they need to review information about their sales in several countries. They need to know their performance in each of them, and they require this information to be detailed by the customer name, category name, and product name. They also need to know the average percentage of discounts, how many orders they have issued, and the total sales amount. Each of these key indicators should be dictated by the month and year. That explains the basis for this case. We need to find the perfect solution by accessing the right dashboard and displaying all of the data in the most consolidated fashion. Creating the dashboard sheet We will begin by creating a new sheet with the name Dashboard: Open the app and click on Create new sheet: Set the Title of the sheet to Dashboard: Click on the sheet icon to save the title, and open the sheet to start creating visualizations. Creating KPI visualizations A KPI visualization is used to get an overview of the performance values that are important to our company. To add the KPI visualizations to the sheet, follow these steps: Click on the Edit button located on the toolbar to enter the edit mode: Click on the Master items button on the asset panel and click on the Measures heading: Click on Sales $ and drag and drop it into the empty space on the sheet: Qlik Sense will create a new visualization of the KPI type because we have selected a measure: Resize the visualization toward the top-left of the sheet: Repeat steps 1 through 5 to add two visualizations for the Avg Discount % and Orders # measures. Place the objects to the right of the previously added visualization: To change the type of visualization from Gauge to KPI, click on the chart type selector: Select the KPI chart type: Now, all three of the measures are visualized as KPI: Creating a pie chart with Sales $ by Categories To add the pie chart with Sales $ by Categories onto the sheet, follow these steps: Click on the Charts button on the asset panel, which is on the left-hand side of the screen, to open the chart selector panel. Click on Pie chart and drag and drop it into the empty space on the sheet: Click on the Add dimension button and select Category in the Dimensions section: Click on the Add measure button and select Sales $ in the Measures section: The pie chart will look like this: Now, we will enhance the presentation of the chart by removing the Dimension label and adding a title to the chart: To remove the Dimension label, select the Appearance button that lies in the properties panel at the right-hand side of the screen and expand Presentation, under which you will find the Dimension label. Turn it off by simply clicking on the toggle button: Click on the title of the object and type Sales $ share by Category: Click on Done in the toolbar to enter the visualization mode: Creating a bar chart with Sales $ by Top 10 Customers To add the bar chart with the top 10 customers by sales $ to the sheet, carry out these steps: Before adding the bar chart, resize the pie chart: Click on the Charts button that lies on the asset panel to open the chart selector panel. Click on Bar chart and drag and drop it into the empty space in the center of the sheet: Click on the Add dimension button and select the Customer option in the Dimensions section. Click on the Add measure button and select Sales $ in the Measures section. The bar chart will look like this: To enhance the presentation of the chart, we will limit the number of customers that are depicted in the chart to 10, and add a title to the chart: Select Data in the properties panel on the right-hand side of the screen and expand the Customer dimension. Set the Limitation values as Fixed number, Top and type 11 in the limitation box: Click on the title of the chart and type Top 10 Customers by Sales $. Click on Done to enter the visualization mode. The bar chart will look like this: Creating the geographical map of sales by country To add the geographical map of sales by country to the sheet, follow these steps: Before adding the map chart, resize the bar chart: Click on the Charts button that lies on the asset panel to open the chart selector panel. Click on the Map button and drag and drop the chart into the empty space on the right-hand side of the sheet: The map visualization will show a default world map with no data, as follows: Here, we need to add an Area layer to plot the countries, and add a Sales $ measure to fill in the area of each country with a color scale: Click on the Add Layer button in the properties panel on the right-hand side of the screen: Select the Area layer: Add the Country dimension, as it contains the information to plot the area: The map will show the country areas filled in with a single color, as follows: To add the Sales $ measure to set the color scale for each country, go to the asset panel at the left-hand side of the screen and click on the Master items heading in the Measures section. Drag and drop the Sales $ measure on top of the map: In the pop-up menu for the map, select Use in "Country"(Area Layer): After that, select Color by: Sales $: The map will now show the countries with more Sales $ in a dark color, and those with lower Sales $ in a light color: Now, click on the title of the object and type Sales $ by Country. Click on the Done button to enter the visualization mode. The sheet will look like this, but it will vary according to your screen resolution: Creating the analysis sheet While the dashboard sheet shows information on several topics for a quick overview, the analysis sheet focuses on a single topic for data exploration. We will create the analysis sheet with the following visualizations: A filter panel, with the dimensions: OrderYear, OrderMonth, Country, Customer, Category, and Product KPI Sales $ KPI Avg Discount % A combo chart for Pareto (80/20) analysis by customer A table with customer data Let's start with creating a new sheet with the name Customer Analysis: Click on the Sheet selection button at the top-right of the screen to open the sheet overview panel. Click on the Create new sheet button and set the title of the sheet to Customer Analysis. To finish this example, click on the sheet icon to save the title, and open the sheet to start creating visualizations. Adding a filter pane with main dimensions We will now build the customer analysis sheet by adding a filter pane by following these steps: Click on the Edit button to enter the edit mode. Click on the Charts button on the asset panel and drag and drop Filter pane into the empty space on the sheet: Click on the Add dimension button and select Order Year in the Dimensions section: Since we need to add more dimensions to our Filter pane, click on the Add dimension button in the properties on the right-hand side of the screen, and select Order Month in the Dimensions section. Repeat the previous step to add the Country, Customer, Category, and Product dimensions. The Filter pane will look like what's shown in the following screenshot: Now, resize the width of the filter panel to fit three columns of the grid: We also need to add the Filter pane as a master visualization, which is to be reused across the analysis and reporting sheets that we will create next: Right-click on the filter pane and select Add to master items: Set the name of the master item to Default Filter and the description to A filter pane to be reused across sheets: Click on the Add button: Adding KPI visualizations To add the KPIs of Sales $ and Avg Discount % to the sheet, we have two options. The first option is to add the KPI visualizations to the Master items library, and add them to the new sheet: Go to the dashboard sheet. Add the KPI visualizations of Sales $ and Avg Discount % to the Master item. Name them KPI Sales $ and KPI Avg Discount %, respectively. From the visualization section in the Master items library, simply drag and drop each of the KPIs into the top end of the sheet. The second option is to copy and paste the KPI visualizations between sheets: Go to the dashboard sheet. Select the KPI visualization for Sales $. Press Ctrl + C or right-click on the visualization object and select Copy in the context menu. Go back to the Customer Analysis sheet. Press Ctrl + V or right-click in the empty area of the sheet and select Paste in the context menu. Repeat the same steps for KPI Avg Discount %. The sheet editor will look like this: Creating a combo chart for Pareto (80/20) analysis A Pareto analysis helps us to identify which groups of customers contribute to the first 80% of our sales. To create a Pareto analysis, we will use a combo chart as it allows us to combine metrics with different shapes such as bars, lines, and symbols. We will represent the data in two axes; the primary axis is found at the left-hand side of the chart, and the secondary axis is found at the right-hand side of the chart. In our example, the chart has a bar for Sales $ in the primary axis, as well as two lines: one for the Cumulative % of sales, and the other as static, with 80% in the secondary axis. In the following screenshot, you can see the highlighted customers contributing to the first 80% of the sales: To create the Pareto analysis chart, follow these steps: Click on the Charts button on the asset panel and find the Combo chart. Drag and drop the Combo chart into the empty space at the right-hand side of the sheet. Click on Add Dimension and select Customer in the Dimension section. Click on Add Measure and select Sales $ in the Measures section. The combo chart will look like this: We need to add two other measures, represented by lines. The first is the cumulative percentage of sales, and the second is the reference line at 80%. To add the cumulative sales line, go to the properties panel, expand the Data section, and click on the Add button in Measures: Click on the fx button to open the expression editor: Type the following expression in the expression editor to calculate a cumulative ratio of the sales for each customer, over the whole amount of the sales of all customers: RangeSum(Above(Sum(SalesAmount), 0, RowNo())) / Sum(total SalesAmount) Click on the Apply button to close the expression editor and save the expression. Set the Label of the new measure to Cumulative Sales %. Check if the properties Line and Secondary axis are selected for the measure: Change the number formatting to Number, set the formatting option to Simple, and select 12.3%. Now, find the Add button in the Measure pane to add another measure: the reference line for 80%. Open the Expression editor, type 0.8, and click on the Apply button. Set the Label to 80%. Check if the properties Line is selected and that the Secondary axis is selected for the measure: We also need to fix the sort order into a descending fashion, by Sales $: Go to the properties panel and expand the Sorting section. Click on Customer to expand the Sorting configuration for the dimension. Switch off the Auto sorting. Click on the checkbox for Sort by expression to select the option. Open the Expression editor and type sum(SalesAmount). Click on Apply to close the expression editor and apply the changes. Set the Title of the chart to Pareto Analysis. Change the sorting order to Descending. Deselect other sorting options if they are selected. The Sorting pane will look like this: Finally, the combo chart will look like this: Creating a reporting sheet Reporting sheets allow the user to see the data in a more granular form. This type of sheet provides information that allows the user to take action at an operational level. We will start this example by creating a new sheet with the name Reporting: Click on the Sheet selection button at the top-right of the screen to open the sheet overview panel Click on the Create new sheet button and set the Title of the sheet to Product Analysis Click on the sheet icon to save the title, open the sheet to start creating visualizations, and enter the edit mode Adding a default filter pane We will start to build the reporting sheet by adding the default filter pane that has already been added to the Master items library: Click on the Edit button to enter the edit mode. Click on the Master items button on the asset panel and find Default filter in the Visualization section. Click on Default filter pane and drag and drop it into the empty space at the top of the sheet. Resize the height of the filter pane to fit one row of the grid. The sheet will then look like this: Next, we will add the table chart to the sheet, as follows: Click on the Charts button on the asset panel and find the Table visualization. Click on Table and drag and drop it into the empty space at the center of the sheet. Click on the Add dimension button and select OrderID from the Field list. Click on Add measure and select Sales $ from the Dimensions list. Click on the Master items button on the asset panel, which is on the left-hand side of the screen, and click the Dimensions heading to expand it. We will then add more dimensions. Drag and drop the Customer dimension on the table. Select Add "Customer" from the floating menu. Repeat the process, using the drag and drop feature to add Country, Category, Product, EmployeesFirstName to the table. Click on the Measures heading in Master items to expand it. Drag and drop the Avg Discount % and Quantity # measures onto the table. Select Add  in the floating menu for each of the selected measure. Click on the Fields button on the asset panel, which is on the left-hand side of the screen. Find the OrderID field in the list. Drag and drop the OrderID field onto the table. Select Add OrderID from the floating menu. Repeat the same steps to add the OrderDate field to the table. The table will look like this: In this article, we saw how to create a Qlik Sense application using the DAR methodology, which will help you to explore and analyze an application's information. If you found this post useful, do check out the book, Hands-On Business Intelligence with Qlik Sense. This book teaches you how to create dynamic dashboards to bring interactive data visualization to your enterprise using Qlik Sense. Best practices for deploying self-service BI with Qlik Sense Four self-service business intelligence user types in Qlik Sense How Qlik Sense is driving self-service Business Intelligence
Read more
  • 0
  • 0
  • 64178

article-image-how-bad-is-the-gender-diversity-crisis-in-ai-research-study-analysing-1-5million-arxiv-papers-says-its-serious
Fatema Patrawala
18 Jul 2019
9 min read
Save for later

How bad is the gender diversity crisis in AI research? Study analysing 1.5million arxiv papers says it’s “serious”

Fatema Patrawala
18 Jul 2019
9 min read
Yesterday the team at Nesta organization, an innovation firm based out of UK published a research on gender diversity in the AI research workforce. The authors of this research are Juan Mateos Garcis, the Director, Konstantinos Stathoulopoulos, the Principal Researcher and Hannah Owen, the Programme Coordinator at Nesta. https://twitter.com/JMateosGarcia/status/1151517641103872006 They have prepared an analysis purely based on 1.5 million arxiv papers. The team claims that it is the first ever study of gender diversity in AI which is not on any convenience sampling or proprietary database. The team posted on its official blog post, “We conducted a large-scale analysis of gender diversity in AI research using publications from arXiv, a repository with more than 1.5 million preprints widely used by the AI community. We aim to expand the evidence base on gender diversity in AI research and create a baseline with which to interrogate the impact of current and future policies and interventions.  To achieve this, we enriched the ArXiv data with geographical, discipline and gender information in order to study the evolution of gender diversity in various disciplines, countries and institutions as well as examine the semantic differences between AI papers with and without female co-authors.” With this research the team also aims to bring prominent female figures they have identified under the spotlight. Key findings from the research Serious gender diversity crisis in AI research The team found a severe gender diversity gap in AI research with only 13.83% of authors being women. Moreover, in relative terms, the proportion of AI papers co-authored by at least one woman has not improved since the 1990s. Juan Mateos thinks this kind of crisis is a waste of talent and it increases the risk of discriminatory AI systems. https://twitter.com/JMateosGarcia/status/1151517642236276736 Location and research domain are significant drivers of gender diversity Women in the Netherlands, Norway and Denmark are more likely to publish AI papers while those in Japan and Singapore are less likely. In the UK, 26.62% of the AI papers have at least one female co-author, placing the country at the 22nd spot worldwide. The US follows the UK in terms of having at least one female co-authors at 25% and for the unique female author US leads one position above UK. Source: Nesta research report Regarding the research domains, women working in Physics and Education, Computer Ethics and other societal issues and Biology are more likely to publish their work on AI in comparison to those working in Computer Science or Mathematics. Source: Nesta research report Significant gender diversity gap in universities, big tech companies and other research institutions Apart from the University of Washington, every other academic institution and organisation in the dataset has less than 25% female AI researchers. Regarding some of the big tech, only 11.3% of Google’s employees who have published their AI research on arXiv are women, while the proportion is similar for Microsoft (11.95%) and is slightly better for IBM (15.66%). Important semantic differences between AI paper with and without a female co-author When examining the publications in the Machine Learning and Societal topics in the UK in 2012 and 2015, papers involving at least one female co-author tend to be more semantically similar to each other than with those without any female authors. Moreover, papers with at least one female co-author tend to be more applied and socially aware, with terms such as fairness, human mobility, mental, health, gender and personality being among the most salient ones. Juan Mateos noted that this is an area which deserves further research. https://twitter.com/JMateosGarcia/status/1151517647361781760   The top 15 women with the most AI publications on arXiv identified Aarti Singh, Associate Professor at the Machine learning department of Carnegie Mellon University Cordelia Schmid, is a part of Google AI team and holds a permanent research position at Inria Grenoble Rhone-Alpes Cynthia Rudin, an associate professor of computer science, electrical and computer engineering, statistical science and mathematics at Duke University Devi Parikh, an Assistant Professor in the School of Interactive Computing at Georgia Tech Karen Livescu, an Associate Professor at Toyota Technical Institute at Chicago Kate Saenko,  an Associate Professor at the Department of Computer at Boston University Kristina Lerman, a Project Leader at the Information Sciences Institute at the University of Southern California Marilyn A. Walker, a Professor at the Department of Computer Science at the University of California Mihaela van der Schaar, is John Humphrey Plummer Professor of Machine Learning, Artificial Intelligence and Medicine at the University of Cambridge and a Turing Fellow at The Alan Turing Institute in London Petia Radeva, a professor at the Department of Mathematics and Computer Science, Faculty of Mathematics and Computer Science at the Universitat de Barcelona Regina Barzilay is a professor at the Massachusetts Institute of Technology and a member of the MIT Computer Science and Artificial Intelligence Laboratory Svetha Venkatesh, an ARC Australian Laureate Fellow, Alfred Deakin Professor and Director of the Centre for Pattern Recognition and Data Analytics (PRaDA) at Deakin University Xiaodan Liang, an Associate Professor at the School of Intelligent Systems Engineering, Sun Yat-sen University Yonina C. Elda, a Professor of Electrical Engineering, Weizmann Faculty of Mathematics and Computer Science at the University of Israel Zeynep Akata, an Assistant Professor with the University of Amsterdam in the Netherlands There are 5 other women researchers who were not identified in the study. Interviews bites from few women contributors and institutions The research team also interviewed few researchers and institutions identified in their work and they think a system wide reform is needed. When the team discussed the findings with the most cited female researcher Mihaela Van Der Schaar, she did feel that her presence in the field has only started to be recognised, having begun her career in 2003, ‘I think that part of the reason for this is because I am a woman, and the experience of (the few) other women in AI in the same period has been similar.’ she says. Professor Van Der Schaar also described herself and many of her female colleagues as ‘faceless’, she suggested that the work of celebrating leading women in the field could have a positive impact on the representation of women, as well as the disparity in the recognition that these women receive. This suggests that work is needed across the pipeline, not just with early-stage invention in education, but support for those women in the field. She also highlighted the importance of open discussion about the challenges women face in the AI sector and that workplace changes such as flexible hours are needed to enable researchers to participate in a fast-paced sector without sacrificing their family life. The team further discussed the findings with the University of Washington’s Eve Riskin, Associate Dean of Diversity and Access in the College of Engineering. Riskin described that much of her female faculty experienced a ‘toxic environment’ and pervasive imposter syndrome. She also emphasized the fact that more research is needed in terms of the career trajectories of the male and female researchers including the recruitment and retention. Some recent examples of exceptional women in AI research and their contribution While these women talk about the diversity gaps in this field recently we have seen works from female researchers like Katie Bouman which gained significant attention. Katie is a post-doctoral fellow at MIT whose algorithm led to an image of a supermassive black hole. But then all the attention became a catalyst for a sexist backlash on social media and YouTube. It set off “what can only be described as a sexist scavenger hunt,” as The Verge described it, in which an apparently small group of vociferous men questioned Bouman’s role in the project. “People began going over her work to see how much she’d really contributed to the project that skyrocketed her to unasked-for fame.” Another incredible example in the field of AI research and ethics is of Meredith Whittaker, an ex-Googler, now a program manager, activist, and co-founder of the AI Now Institute at New York University. Meredith is committed to the AI Now Institute, her AI ethics work, and to organize an accountable tech industry. On Tuesday,  Meredith left Google after facing retaliation from company for organizing last year’s protest of Google Walkout for Real Change demanding the company for structural changes to ensure a safe and conducive work environment for everyone.. Other observations from the research and next steps The research also highlights the fact that women are as capable as men in contributing to technical topics while they tend to contribute more than men to publications with a societal or ethical output. Some of the leading AI researchers in the field shared their opinion on this: Petia Radeva, Professor at the Department of Mathematics and Computer Science at the University of Barcelona, was positive that the increasingly broad domains of application for AI and the potential impact of this technology will attract more women into the sector. Similarly, Van Der Schaar suggests that “publicising the interdisciplinary scope of possibilities and career paths that studying AI can lead to will help to inspire a more diverse group of people to pursue it. In parallel, the industry will benefit from a pipeline of people who are motivated by combining a variety of ideas and applying them across domains.” The research team in future will explore the temporal co-authorship network of AI papers to examine how different the career trajectory of male and female researchers might be. They will survey AI researchers on arXiv and investigate the drivers of the diversity gap in more detail through their innovation mapping methods. They also plan to extend this analysis to identify the representation of other underrepresented groups. Meredith Whittaker, Google Walkout organizer, and AI ethics researcher is leaving the company, adding to its brain-drain woes over ethical concerns “I’m concerned about Libra’s model for decentralization”, says co-founder of Chainspace, Facebook’s blockchain acquisition DeepMind’s Alphastar AI agent will soon anonymously play with European StarCraft II players
Read more
  • 0
  • 0
  • 8442

article-image-python-3-8-new-features-the-walrus-operator-positional-only-parameters-and-much-more
Bhagyashree R
18 Jul 2019
5 min read
Save for later

Python 3.8 new features: the walrus operator, positional-only parameters, and much more

Bhagyashree R
18 Jul 2019
5 min read
Earlier this month, the team behind Python announced the release of Python 3.8b2, the second of four planned beta releases. Ahead of the third beta release, which is scheduled for 29th July, we look at some of the key features coming to Python 3.8. The "incredibly controversial" walrus operator The walrus operator was proposed in PEP 572 (Assignment Expressions) by Chris Angelico, Tim Peters, and Guido van Rossum last year. Since then it has been heavily discussed in the Python community with many questioning whether it is a needed improvement. Others were excited as the operator does make the code a tiny bit more readable. At the end of the PEP discussion, Guido van Rossum stepped down as BDFL (benevolent dictator for life) and the creation of a new governance model. In an interview with InfoWorld, Guido shared, “The straw that broke the camel’s back was a very contentious Python enhancement proposal, where after I had accepted it, people went to social media like Twitter and said things that really hurt me personally. And some of the people who said hurtful things were actually core Python developers, so I felt that I didn’t quite have the trust of the Python core developer team anymore.” According to PEP 572, the assignment expression is a syntactical operator that allows you to assign values to a variable as a part of an expression. Its aim is to simplify things like multiple-pattern matches and the so-called loop and a half. At PyCon 2019, Dustin Ingram, a PyPI maintainer, gave a few examples where you can use this syntax: Balancing lines of codes and complexity Avoiding inefficient comprehensions Avoiding unnecessary variables in scope You can watch the full talk on YouTube: https://www.youtube.com/watch?v=6uAvHOKofws The feature was implemented by Emily Morehouse, Python core developer and Founder, Director of Engineering at Cuttlesoft, and was merged earlier this year: https://twitter.com/emilyemorehouse/status/1088593522142339072 Explaining other improvements this feature brings, Jake Edge, a contributor on LWN.net wrote, “These and other uses (e.g. in list and dict comprehensions) help make the intent of the programmer clearer. It is a feature that many other languages have, but Python has, of course, gone without it for nearly 30 years at this point. In the end, it is actually a fairly small change for all of the uproars it caused.” Positional-only parameters Proposed in PEP 570, this introduces a new syntax (/) to specify positional-only parameters in Python function definitions. This is similar to how * indicates that the arguments to its right are keyword only. This syntax is already used by many CPython built-in and standard library functions, for instance, the pow() function: pow(x, y, z=None, /) This syntax gives library authors more control over better expressing the intended usage of an API and allows the API to “evolve in a safe, backward-compatible way.”  It gives library authors the flexibility to change the name of positional-only parameters without breaking callers. Additionally, this also ensures consistency of the Python language with existing documentation and the behavior of various  "builtin" and standard library functions. As with PEP 572, this proposal also got mixed reactions from Python developers. In support, one developer said, “Position-only parameters already exist in cpython builtins like range and min. Making their support at the language level would make their existence less confusing and documented.” While others think that this will allow authors to “dictate” how their methods could be used. “Not the biggest fan of this one because it allows library authors to overly dictate how their functions can be used, as in, mark an argument as positional merely because they want to. But cool all the same,” a Redditor commented. Debug support for f-strings Formatted strings (f-strings) were introduced in Python 3.6 with PEP 498. It enables you to evaluate an expression as part of the string along with inserting the result of function calls and so on. In Python 3.8, some additional syntax changes have been made by adding add (=) specifier and a !d conversion for ease of debugging. You can use this feature like this: print(f'{foo=} {bar=}') This provides developers a better way of doing “print-style debugging”, especially for those who have a background in languages that already have such feature such as  Perl, Ruby, JavaScript, etc. One developer expressed his delight on Hacker News, “F strings are pretty awesome. I’m coming from JavaScript and partially java background. JavaScript’s String concatenation can become too complex and I have difficulty with large strings.” Python Initialization Configuration Though Python is highly configurable, its configuration seems scattered all around the code.  The PEP 587 introduces a new C API to configure the Python Initialization giving developers finer control over the configuration and better error reporting. Among the improvements, this API will bring include ability to read and modify configuration before it is applied and overriding how Python computes the module search paths (``sys.path``). Along with these, there are many other exciting features coming to Python 3.8, which is currently scheduled for October, including a fast calling protocol for CPython, Vectorcall, support for out-of-band buffers in pickle protocol 5, and more. You can find the full list on Python’s official website. Python serious about diversity, dumps offensive ‘master’, ‘slave’ terms in its documentation Introducing PyOxidizer, an open source utility for producing standalone Python applications, written in Rust Python 3.8 beta 1 is now ready for you to test  
Read more
  • 0
  • 0
  • 26970

article-image-microsoft-mulls-replacing-c-and-c-code-with-rust-calling-it-a-a-modern-safer-system-programming-language-with-great-memory-safety-features
Vincy Davis
18 Jul 2019
3 min read
Save for later

Microsoft mulls replacing C and C++ code with Rust calling it a "modern safer system programming language" with great memory safety features

Vincy Davis
18 Jul 2019
3 min read
Here's another reason why Rust is the present and the future in programming. Few days ago, Microsoft announced that they are going to start exploring Rust and skip their own C languages. This announcement was made by the Principal Security Engineering Manager of Microsoft Security Response Centre (MSRC), Gavin Thomas. Thomas states that ~70% of the vulnerabilities which Microsoft assigns a CVE each year are caused by developers, who accidently insert memory corruption bugs into their C and C++ code. He adds, "As Microsoft increases its code base and uses more Open Source Software in its code, this problem isn’t getting better, it's getting worse. And Microsoft isn’t the only one exposed to memory corruption bugs—those are just the ones that come to MSRC." Image Source: Microsoft blog He highlights the fact that even after having so many security mechanisms (like static analysis tools, fuzzing at scale, taint analysis, many encyclopaedias of coding guidelines, threat modelling guidance, etc) to make a code secure, developers have to invest a lot of time in studying about more tools for training and vulnerability fixes. Thomas states that though C++ has many qualities like fast, mature, small memory and disk footprint, it does not have the memory security guarantee of languages like .NET C#. He believes that Rust is one language, which can provide both the requirements. Thomas strongly advocates that a software security industry should focus on providing a secure environment for developers to work on, rather than turning deaf ear to the importance of security, outdated methods and approaches. He thus concludes by hinting that Microsoft is going to adapt the Rust programming language. As he says that, "Perhaps it's time to scrap unsafe legacy languages and move on to a modern safer system programming language?" Microsoft exploring Rust is not surprising as Rust has been popular with many developers for its simpler syntax, less bugs, memory safe and thread safety. It has also been voted as the most loved programming language, according to the 2019 StackOverflow survey, the biggest developer survey on the internet. It allows developers to focus on their applications, rather than worrying about its security and maintenance. Recently, there have been many applications written in Rust, like Vector, Brave ad-blocker, PyOxidizer and more. Developers couldn't agree more with this post, as all have expressed their love for Rust. https://twitter.com/alilleybrinker/status/1151495738158977024 https://twitter.com/karanganesan/status/1151485485644054528 https://twitter.com/shah_sheikh/status/1151457054004875264 A Redditor says, "While this first post is very positive about memory-safe system programming languages in general and Rust in particular, I would not call this an endorsement. Still, great news!" Visit the Microsoft blog for more details. Introducing Ballista, a distributed compute platform based on Kubernetes and Rust EU Commission opens an antitrust case against Amazon on grounds of violating EU competition rules Fastly CTO Tyler McMullen on Lucet and the future of WebAssembly and Rust [Interview]
Read more
  • 0
  • 0
  • 29036

article-image-what-is-hcl-hashicorp-configuration-language-how-does-it-relate-to-terraform-and-why-is-it-growing-in-popularity
Savia Lobo
18 Jul 2019
6 min read
Save for later

What is HCL (Hashicorp Configuration Language), how does it relate to Terraform, and why is it growing in popularity?

Savia Lobo
18 Jul 2019
6 min read
HCL (Hashicorp Configuration language), is rapidly growing in popularity. Last year's Octoverse report by GitHub showed it to be the second fastest growing language on the platform, more than doubling in contributors since 2017 (Kotlin was top, with GitHub contributors growing 2.6 times). However, despite its growth, it hasn’t had the level of attention that other programming languages have had. One of the reasons for this is that HCL is a configuration language. It's also part of a broader ecosystem of tools built by cloud automation company HashiCorp that largely center around Terraform. What is Terraform? Terraform is an infrastructure-as-code tool that makes it easier to define and manage your cloud infrastructure. HCL is simply the syntax that allows you to better leverage its capabilities. It gives you a significant degree of control over your infrastructure in a way that’s more ‘human-readable’ than other configuration languages such as YAML and JSON. HCL and Terraform are both important parts of the DevOps world. They are not only built for a world that has transitioned to infrastructure-as-code, but also one in which this transition demands more from engineers. By making HCL a more readable, higher-level configuration language, the language can better facilitate collaboration and transparency between cross-functional engineering teams. With all of this in mind, HCL’s growing popularity can be taken to indicate broader shifts in the software development world. HashiCorp clearly understands them very well and is eager to help drive them forward. But before we go any further, let's dive a bit deeper into why HCL was created, how it works, and how it sits within the Terraform ecosystem. Why did Hashicorp create HCL? The development of HCL was borne from of HashiCorp’s experience of trying multiple different options for configuration languages. “What we learned,” the team explains on GitHub, “is that some people wanted human-friendly configuration languages and some people wanted machine-friendly languages.” The HashiCorp team needed a compromise - something that could offer a degree of flexibility and accessibility. As the team outlines their thinking, it’s clear to see what the drivers behind HCL actually are. JSON, they say, “is fairly verbose and... doesn't support comments” while YAML is viewed as too complex for beginners to properly parse and use effectively. Traditional programming languages also pose problems. Again, they’re too sophisticated and demand too much background knowledge from users to make them a truly useful configuration language. Put together, this underlines the fact that with HCL HashiCorp wanted to build something that is accessible to engineers of different abilities and skill sets, while also being clear enough to enable appropriate levels of transparency between teams. It is “designed to be written and modified by humans.” Listen: Uber engineer Yuri Shkuro talks distributed tracing and observability on the Packt Podcast How does the Hashicorp Configuration Language work? HCL is not a replacement for the likes of YAML or JSON. The team’s aim “is not to alienate other configuration languages. It is,” they say, “instead to provide HCL as a specialized language for our tools, and JSON as the interoperability layer.” Effectively, it builds on some of the things you can get with JSON, but reimagines them in the context of infrastructure and application configuration. According to the documentation, we should see HCL as a “structured configuration language rather than a data structure serialization language.” HCL is “always decoded using an application-defined schema,” which gives you a level of flexibility. It quite means the application is always at the center of the language. You don't have to work around it. If you want to learn more about the HCL syntax and how it works at a much deeper level, the documentation is a good place to start, as is this page on GitHub. Read next: Why do IT teams need to transition from DevOps to DevSecOps? The advantages of HCL and Terraform You can’t really talk about the advantages of HCL without also considering the advantages of Terraform. Indeed, while HCL might well be a well designed configuration language that’s accessible and caters to a wide range of users and use cases, it’s only in the context of Terraform that its growth really makes sense. Why is Terraform so popular? To understand the popularity of Terraform, you need to place it in the context of current trends and today’s software marketplace for infrastructure configuration. Terraform is widely seen as a competitor to configuration management tools like Chef, Ansible and Puppet. However, Terraform isn’t exactly a configuration management - it’s more accurate to call it a provisioning tool (config management tools configure software on servers that already exist - provisioning tools set up new ones). This is important because thanks to Docker and Kubernetes, the need for configuration has radically changed - you might even say that it’s no longer there. If a Docker container is effectively self-sufficient, with all the configuration files it needs to run, then the need for ‘traditional’ configuration management begins to drop. Of course, this isn’t to say that one tool is intrinsically better than any other. There are use cases for all of these types of tools. But the fact remains is that Terraform suits use cases that are starting to grow. Part of this is due to the rise of cloud agnosticism. As multi-cloud and hybrid cloud architectures become prevalent, DevOps teams need tools that let them navigate and manage resources across different platforms. Although all the major public cloud vendors have native tools for managing resources, these can sometimes be restrictive. The templates they offer can also be difficult to reuse. Take Azure ARM templates, for example - it can only be used to create an Azure resource. In contrast, Terraform allows you to provision and manage resources across different cloud platforms. Conclusion: Terraform and HCL can make DevOps more accessible It’s not hard to see why ThoughtWorks sees Terraform as such an important emerging technology. (In the last edition of ThoughtWorks Radar is claimed that now is the time to adopt it.) But it’s also important to understand that HCL is an important element in the success of Terraform. It makes infrastructure-as-code not only something that’s accessible to developers that might have previously only dipped their toes in operations, but also something that can be more collaborative, transparent, and observable for team members. The DevOps picture will undoubtedly evolve over the next few years, but it would appear that HashiCorp is going to have a big part to play in it.
Read more
  • 0
  • 0
  • 24331
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-implementing-horizontal-pod-autoscaling-in-kubernetes-tutorial
Savia Lobo
18 Jul 2019
18 min read
Save for later

Implementing Horizontal Pod Autoscaling in Kubernetes [Tutorial]

Savia Lobo
18 Jul 2019
18 min read
When we use Kubernetes deployments to deploy our pod workloads, it is simple to scale the number of replicas used by our applications up and down using the kubectl scale command. However, if we want our applications to automatically respond to changes in their workloads and scale to meet demand, then Kubernetes provides us with Horizontal Pod Autoscaling. This article is an excerpt taken from the book Kubernetes on AWS written by Ed Robinson. In this book, you will start by learning about Kubernetes' powerful abstractions - Pods and Services - that make managing container deployments easy.  Horizontal Pod Autoscaling allows us to define rules that will scale the numbers of replicas up or down in our deployments based on CPU utilization and optionally other custom metrics. Before we are able to use Horizontal Pod Autoscaling in our cluster, we need to deploy the Kubernetes metrics server; this server provides endpoints that are used to discover CPU utilization and other metrics generated by our applications. In this article, you will learn how to use the horizontal pod autoscaling method to automatically scale your applications and to automatically provision and terminate EC2 instances. Deploying the metrics server Before we can make use of Horizontal Pod Autoscaling, we need to deploy the Kubernetes metrics server to our cluster. This is because the Horizontal Pod Autoscaling controller makes use of the metrics provided by the metrics.k8s.io API, which is provided by the metrics server. While some installations of Kubernetes may install this add-on by default, in our EKS cluster we will need to deploy it ourselves. There are a number of ways to deploy add-on components to your cluster: If you are using helm to manage applications on your cluster, you could use the stable/metrics server chart. For simplicity we are just going to deploy the metrics server manifests using kubectl. I like to integrate deploying add-ons such as the metrics server and kube2iam with the process that provisions the cluster, as I see them as integral parts of the cluster infrastructure. But if you are going to use a tool like a helm to manage deploying applications to your cluster, then you might prefer to manage everything running on your cluster with the same tool. The decision you take really depends on the processes you and your team adopt for managing your cluster and the applications that run on it. The metrics server is developed in the GitHub repository. You will find the manifests required to deploy it in the deploy directory of that repository. Start by cloning the configuration from GitHub. The metrics server began supporting the authentication methods provided by EKS in version 0.0.3 so make sure the manifests you have use at least that version. You will find a number of manifests in the deploy/1.8+ directory. The auth-reader.yaml and auth-delegator.yaml files configure the integration of the metrics server with the Kubernetes authorization infrastructure. The resource-reader.yaml file configures a role to give the metrics server the permissions to read resources from the API server, in order to discover the nodes that pods are running on. Basically, metrics-server-deployment.yaml and metrics-server-service.yaml define the deployment used to run the service itself and a service to be able to access it. Finally, the metrics-apiservice.yaml file defines an APIService resource that registers the metrics.k8s.io API group with the Kubernetes API server aggregation layer; this means that requests to the API server for the metrics.k8s.io group will be proxied to the metrics server service. Deploying these manifests with kubectl is simple, just submit all of the manifests to the cluster with kubectl apply: $ kubectl apply -f deploy/1.8+ You should see a message about each of the resources being created on the cluster. If you are using a tool like Terraform to provision your cluster, you might use it to submit the manifests for the metrics server when you create your cluster. Verifying the metrics server and troubleshooting Before we continue, we should take a moment to check that our cluster and the metrics server are correctly configured to work together. After the metrics server is running on your cluster and has had a chance to collect metrics from the cluster (give it a minute or so), you should be able to use the kubectl top command to see the resource usage of the pods and nodes in your cluster. Start by running kubectl top nodes. If you see output like this, then the metrics server is configured correctly and is collecting metrics from your nodes: $ kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% ip-10-3-29-209 20m 1% 717Mi 19% ip-10-3-61-119 24m 1% 1011Mi 28% If you see an error message, then there are a number of troubleshooting steps you can follow. You should start by describing the metrics server deployment and checking that one replica is available: kubectl -n kube-system describe deployment metrics-server If it is not, you should debug the created pod by running kubectl -n kube-system describe pod. Look at the events to see why the server is not available. Make sure that you are running at least version 0.0.3 of the metrics server. If the metrics server is running correctly and you still see errors when running kubectl top, the issue is that the APIservice registered with the aggregation layer is not configured correctly. Check the events output at the bottom of the information returned when you run kubectl describe apiservice v1beta1.metrics.k8s.io. One common issue is that the EKS control plane cannot connect to the metrics server service on port 443. Autoscaling pods based on CPU usage Once the metrics server has been installed into our cluster, we will be able to use the metrics API to retrieve information about CPU and memory usage of the pods and nodes in our cluster. Using the kubectl top command is a simple example of this. The Horizontal Pod Autoscaler can also use this same metrics API to gather information about the current resource usage of the pods that make up a deployment. Let's look at an example of this; we are going to deploy a sample application that uses a lot of CPU under load, then configure a Horizontal Pod Autoscaler to scale up extra replicas of this pod to provide extra capacity when CPU utilization exceeds a target level. The application we will be deploying as an example is a simple Ruby web application that can calculate the nth number in the Fibonacci sequence, this application uses a simple recursive algorithm, and is not very efficient (perfect for us to experiment with autoscaling). The deployment for this application is very simple. It is important to set resource limits for CPU because the target CPU utilization is based on a percentage of this limit: deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: fib labels: app: fib spec: selector: matchLabels: app: fib template: metadata: labels: app: fib spec: containers: - name: fib image: errm/fib ports: - containerPort: 9292 resources: limits: cpu: 250m memory: 32Mi We are not specifying a number of replicas in the deployment spec; when we first submit this deployment to the cluster, the number of replicas will therefore default to 1. This is good practice when creating a deployment where we intend the replicas to be adjusted by a Horizontal Pod Autoscaler, because it means that if we use kubectl apply to update the deployment later, we won't override the replica value the Horizonal Pod Autoscaler has set (inadvertently scaling the deployment down or up). Let's deploy this application to the cluster: kubectl apply -f deployment.yaml You could run kubectl get pods -l app=fib to check that the application started up correctly. We will create a service, so we are able to access the pods in our deployment, requests will be proxied to each of the replicas, spreading the load: service.yaml kind: Service apiVersion: v1 metadata: name: fib spec: selector: app: fib ports: - protocol: TCP port: 80 targetPort: 9292 Submit the service manifest to the cluster with kubectl: kubectl apply -f service.yaml We are going to configure a Horizonal Pod Autoscaler to control the number of replicas in our deployment. The spec defines how we want the autoscaler to behave; we have defined here that we want the autoscaler to maintain between 1 and 10 replicas of our application and achieve a target average CPU utilization of 60, across those replicas. When CPU utilization falls below 60%, then the autoscaler will adjust the replica count of the targeted deployment down; when it goes above 60%, replicas will be added: hpa.yaml kind: HorizontalPodAutoscaler apiVersion: autoscaling/v2beta1 metadata: name: fib spec: maxReplicas: 10 minReplicas: 1 scaleTargetRef: apiVersion: app/v1 kind: Deployment name: fib metrics: - type: Resource resource: name: cpu targetAverageUtilization: 60 Create the autoscaler with kubectl: kubectl apply -f hpa.yaml The kubectl autoscale command is a shortcut to create a HorizontalPodAutoscaler. Running kubectl autoscale deployment fib --min=1 --max=10 --cpu-percent=60 would create an equivalent autoscaler. Once you have created the Horizontal Pod Autoscaler, you can see a lot of interesting information about its current state with kubectl describe: $ kubectl describe hpa fib Name: fib Namespace: default CreationTimestamp: Sat, 15 Sep 2018 14:32:46 +0100 Reference: Deployment/fib Metrics: ( current / target ) resource cpu: 0% (1m) / 60% Min replicas: 1 Max replicas: 10 Deployment pods: 1 current / 1 desired Now we have set up our Horizontal Pod Autoscaler, we should generate some load on the pods in our deployment to illustrate how it works. In this case, we are going to use the ab (Apache benchmark) tool to repeatedly ask our application to compute the thirtieth Fibonacci number: load.yaml apiVersion: batch/v1 kind: Job metadata: name: fib-load labels: app: fib component: load spec: template: spec: containers: - name: fib-load image: errm/ab args: ["-n1000", "-c4", "fib/30"] restartPolicy: OnFailure This job uses ab to make 1,000 requests to the endpoint (with a concurrency of 4). Submit the job to the cluster, then observe the state of the Horizontal Pod Autoscaler: kubectl apply -f load.yaml watch kubectl describe hpa fib Once the load job has started to make requests, the autoscaler will scale up the deployment in order to handle the load: Name: fib Namespace: default CreationTimestamp: Sat, 15 Sep 2018 14:32:46 +0100 Reference: Deployment/fib Metrics: ( current / target ) resource cpu: 100% (251m) / 60% Min replicas: 1 Max replicas: 10 Deployment pods: 2 current / 2 desired Autoscaling pods based on other metrics The metrics server provides APIs that the Horizontal Pod Autoscaler can use to gain information about the CPU and memory utilization of pods in the cluster. It is possible to target a utilization percentage like we did for the CPU metric, or to target the absolute value as we have here for the memory metric: hpa.yaml kind: HorizontalPodAutoscaler apiVersion: autoscaling/v2beta1 metadata: name: fib spec: maxReplicas: 10 minReplicas: 1 scaleTargetRef: apiVersion: app/v1 kind: Deployment name: fib metrics: - type: Resource resource: name: memory targetAverageValue: 20M The Horizonal Pod Autoscaler also allows us to scale on other metrics provided by more comprehensive metrics systems. Kubernetes allows for metrics APIs to be aggregated for custom and external metrics. Custom metrics are metrics other than CPU and memory that are associated with a pod. You might for example use an adapter that allows you to use metrics that a system like Prometheus has collected from your pods. This can be very beneficial if you have more detailed metrics available about the utilization of your application, for example, a forking web server that exposes a count of busy worker processes, or a queue processing application that exposes metrics about the number of items currently enqueued. External metrics adapters provide information about resources that are not associated with any object within Kubernetes, for example, if you were using an external queuing system, such as the AWS SQS service.   On the whole, it is simpler if your applications can expose metrics about resources that they depend on that use an external metrics adapter, as it can be hard to limit access to particular metrics, whereas custom metrics are tied to a particular Pod, so Kubernetes can limit access to only those users and processes that need to use them. Autoscaling the cluster The capabilities of Kubernetes Horizontal Pod Autoscaler allow us to add and remove pod replicas from our applications as their resource usage changes over time. However, this makes no difference to the capacity of our cluster. If our pod autoscaler is adding pods to handle an increase in load, then eventually we might run out of space in our cluster, and additional pods would fail to be scheduled. If there is a decrease in the load on our application and the pod autoscaler removes pods, then we are paying AWS for EC2 instances that will sit idle. When we created our cluster in Chapter 7, A Production-Ready Cluster, we deployed the cluster nodes using an autoscaling group, so we should be able to use this to grow and shrink the cluster as the needs of the applications deployed to it change over time. Autoscaling groups have built-in support for scaling the size of the cluster, based on the average CPU utilization of the instances. This, however, is not really suitable when dealing with a Kubernetes cluster because the workloads running on each node of our cluster might be quite different, so the average CPU utilization is not really a very good proxy for the free capacity of the cluster. Thankfully, in order to schedule pods to nodes effectively, Kubernetes keeps track of the capacity of each node and the resources requested by each pod. By utilizing this information, we can automate scaling the cluster to match the size of the workload. The Kubernetes autoscaler project provides a cluster autoscaler component for some of the main cloud providers, including AWS. The cluster autoscaler can be deployed to our cluster quite simply. As well as being able to add instances to our cluster, the cluster autoscaler is also able to drain the pods from and then terminate instances when the capacity of the cluster can be reduced.   Deploying the cluster autoscaler Deploying the cluster autoscaler to our cluster is quite simple as it just requires a simple pod to be running. All we need for this is a simple Kubernetes deployment. In order for the cluster autoscaler to update the desired capacity of our autoscaling group, we need to give it permissions via an IAM role. If you are using kube2iam, we will be able to specify this role for the cluster autoscaler pod via an appropriate annotation: cluster_autoscaler.tf data "aws_iam_policy_document" "eks_node_assume_role_policy" { statement { actions = ["sts:AssumeRole"] principals { type = "AWS" identifiers = ["${aws_iam_role.node.arn}"] } } } resource "aws_iam_role" "cluster-autoscaler" { name = "EKSClusterAutoscaler" assume_role_policy = "${data.aws_iam_policy_document.eks_node_assume_role_policy.json}" } data "aws_iam_policy_document" "autoscaler" { statement { actions = [ "autoscaling:DescribeAutoScalingGroups", "autoscaling:DescribeAutoScalingInstances", "autoscaling:DescribeTags", "autoscaling:SetDesiredCapacity", "autoscaling:TerminateInstanceInAutoScalingGroup" ] resources = ["*"] } } resource "aws_iam_role_policy" "cluster_autoscaler" { name = "cluster-autoscaler" role = "${aws_iam_role.cluster_autoscaler.id}" policy = "${data.aws_iam_policy_document.autoscaler.json}" }   In order to deploy the cluster autoscaler to our cluster, we will submit a deployment manifest using kubectl. We will use Terraform's templating system to produce the manifest. We create a service account that is used by the autoscaler to connect to the Kubernetes API: cluster_autoscaler.tpl --- apiVersion: v1 kind: ServiceAccount metadata: labels: k8s-addon: cluster-autoscaler.addons.k8s.io k8s-app: cluster-autoscaler name: cluster-autoscaler namespace: kube-system The cluster autoscaler needs to read information about the current resource usage of the cluster, and needs to be able to evict pods from nodes that need to be removed from the cluster and terminated. Basically, cluster-autoscalerClusterRole provides the required permissions for these actions. The following is the code continuation for cluster_autoscaler.tpl: --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRole metadata: name: cluster-autoscaler labels: k8s-addon: cluster-autoscaler.addons.k8s.io k8s-app: cluster-autoscaler rules: - apiGroups: [""] resources: ["events","endpoints"] verbs: ["create", "patch"] - apiGroups: [""] resources: ["pods/eviction"] verbs: ["create"] - apiGroups: [""] resources: ["pods/status"] verbs: ["update"] - apiGroups: [""] resources: ["endpoints"] resourceNames: ["cluster-autoscaler"] verbs: ["get","update"] - apiGroups: [""] resources: ["nodes"] verbs: ["watch","list","get","update"] - apiGroups: [""] resources: ["pods","services","replicationcontrollers","persistentvolumeclaims","persistentvolumes"] verbs: ["watch","list","get"] - apiGroups: ["extensions"] resources: ["replicasets","daemonsets"] verbs: ["watch","list","get"] - apiGroups: ["policy"] resources: ["poddisruptionbudgets"] verbs: ["watch","list"] - apiGroups: ["apps"] resources: ["statefulsets"] verbs: ["watch","list","get"] - apiGroups: ["storage.k8s.io"] resources: ["storageclasses"] verbs: ["watch","list","get"] --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRoleBinding metadata: name: cluster-autoscaler labels: k8s-addon: cluster-autoscaler.addons.k8s.io k8s-app: cluster-autoscaler roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-autoscaler subjects: - kind: ServiceAccount name: cluster-autoscaler namespace: kube-system Note that cluster-autoscaler stores state information in a config map, so needs permissions to be able to read and write from it. This role allows that. The following is the code continuation for cluster_autoscaler.tpl: --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: Role metadata: name: cluster-autoscaler namespace: kube-system labels: k8s-addon: cluster-autoscaler.addons.k8s.io k8s-app: cluster-autoscaler rules: - apiGroups: [""] resources: ["configmaps"] verbs: ["create"] - apiGroups: [""] resources: ["configmaps"] resourceNames: ["cluster-autoscaler-status"] verbs: ["delete","get","update"] --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: RoleBinding metadata: name: cluster-autoscaler namespace: kube-system labels: k8s-addon: cluster-autoscaler.addons.k8s.io k8s-app: cluster-autoscaler roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: cluster-autoscaler subjects: - kind: ServiceAccount name: cluster-autoscaler namespace: kube-system Finally, let's consider the manifest for the cluster autoscaler deployment itself. The cluster autoscaler pod contains a single container running the cluster autoscaler control loop. You will notice that we are passing some configuration to the cluster autoscaler as command-line arguments. Most importantly, the --node-group-auto-discovery flag allows the autoscaler to operate on autoscaling groups with the kubernetes.io/cluster/<cluster_name> tag. This is convenient because we don't have to explicitly configure the autoscaler with our cluster autoscaling group. If your Kubernetes cluster has nodes in more than one availability zone and you are running pods that rely on being scheduled to a particular zone (for example, pods that are making use of EBS volumes), it is recommended to create an autoscaling group for each availability zone that you plan to use. If you use one autoscaling group that spans several zones, then the cluster autoscaler will be unable to specify the availability zone of the instances that it launches. Here is the code continuation for cluster_autoscaler.tpl: --- apiVersion: extensions/v1beta1 kind: Deployment metadata: name: cluster-autoscaler namespace: kube-system labels: app: cluster-autoscaler spec: replicas: 1 selector: matchLabels: app: cluster-autoscaler template: metadata: annotations: iam.amazonaws.com/role: ${iam_role} labels: app: cluster-autoscaler spec: serviceAccountName: cluster-autoscaler containers: - image: k8s.gcr.io/cluster-autoscaler:v1.3.3 name: cluster-autoscaler resources: limits: cpu: 100m memory: 300Mi requests: cpu: 100m memory: 300Mi command: - ./cluster-autoscaler - --v=4 - --stderrthreshold=info - --cloud-provider=aws - --skip-nodes-with-local-storage=false - --expander=least-waste - --node-group-auto-discovery=asg:tag=kubernetes.io/cluster/${cluster_name} env: - name: AWS_REGION value: ${aws_region} volumeMounts: - name: ssl-certs mountPath: /etc/ssl/certs/ca-certificates.crt readOnly: true imagePullPolicy: "Always" volumes: - name: ssl-certs hostPath: path: "/etc/ssl/certs/ca-certificates.crt" Finally, we render the templated manifest by passing in the variables for the AWS region, cluster name and IAM role, and submitting the file to Kubernetes using kubectl: Here is the code continuation for cluster_autoscaler.tpl: data "aws_region" "current" {} data "template_file" " cluster_autoscaler " { template = "${file("${path.module}/cluster_autoscaler.tpl")}" vars { aws_region = "${data.aws_region.current.name}" cluster_name = "${aws_eks_cluster.control_plane.name}" iam_role = "${aws_iam_role.cluster_autoscaler.name}" } } resource "null_resource" "cluster_autoscaler" { trigers = { manifest_sha1 = "${sha1("${data.template_file.cluster_autoscaler.rendered}")}" } provisioner "local-exec" { command = "kubectl --kubeconfig=${local_file.kubeconfig.filename} apply -f -<<EOF\n${data.template_file.cluster_autoscaler.rendered}\nEOF" } } Thus, by understanding how Kubernetes assigns Quality of Service classes to your pods based on the resource requests and limits that you assign them, you can have precisely control how your pods are managed. By ensuring your critical applications, such as web servers and databases, run with the Guaranteed class, you can ensure that they will perform consistently and suffer minimal disruption when pods need to be rescheduled. If you have enjoyed reading this post, head over to our book, Kubernetes on AWS, for tips on deploying and managing applications, keeping your cluster and applications secure, and ensuring that your whole system is reliable and resilient to failure Low Carbon Kubernetes Scheduler: A demand side management solution that consumes electricity in low grid carbon intensity areas A vulnerability discovered in Kubernetes kubectl cp command can allow malicious directory traversal attack on a targeted system Kubernetes 1.15 releases with extensibility around core Kubernetes APIs, cluster lifecycle stability, and more!
Read more
  • 0
  • 0
  • 40774

article-image-elon-musks-neuralink-unveils-a-sewing-machine-like-robot-to-control-computers-via-the-brain
Sugandha Lahoti
17 Jul 2019
8 min read
Save for later

Elon Musk's Neuralink unveils a “sewing machine-like” robot to control computers via the brain

Sugandha Lahoti
17 Jul 2019
8 min read
After two years of being super-secretive about their work, Neuralink, Elon’s Musk’s neurotechnology company, has finally presented their progress in brain-computer interface technology. The Livestream which was uploaded on YouTube showcases a “sewing machine-like” robot that can implant ultrathin threads deep into the brain giving people the ability to control computers and smartphones using their thoughts. For its brain-computer interface tech, the company has received $158 million in funding and has 90 employees. Note: All images are taken from Neuralink Livestream video unless stated otherwise. Elon Musk opened the presentation talking about the primary aim of Neuralink which is to use brain-computer interface tech to understand and treat brain disorders, preserve and enhance the brain, and ultimately and this may sound weird, “achieve a symbiosis with artificial intelligence”. He added, “This is not a mandatory thing. It is a thing you can choose to have if you want. This is something that I think will be really important on a civilization-level scale.” Neuralink wants to build, record from and selectively stimulate as many neurons as possible across diverse brain areas. They have three goals: Increase by orders of magnitude, the number of neurons you can read from and write to in safe, long-lasting ways. At each stage, produce devices that serve critical unmet medical needs of patients. Make inserting a computer connection into your brain as safe and painless as LASIK eye surgery. The robot that they have built was designed to be completely wireless, with a  practical bandwidth that is usable at home and lasts for a long time. Their system has an N1 sensor, which is an 8mm wide, 4mm tall cylinder having 1024 electrodes. It consists of a thin film, which has threads. The threads are placed using thin needles, into the brain by a robotic system in a manner akin to a sewing machine avoiding blood vessels. The robot peels off the threads one by one from the N1 Sensor and places it in the brain. A needle would grab each thread by a small loop and then is inserted into the brain by the robot. The robot is under the supervision of a human neurosurgeon who lays out where the threads are placed. The actual needle which the robot uses is 24 microns. The process puts a 2mm incision near the human ear, which is dilated to 8mm. The threads A robot implants threads using a needle For the first patients, the Neuralink team is looking at four sensors which will be connected via very small wires under the scalp to an inductive coil behind the ear. This is encased in a wearable device that they call the ‘Link’ which contains a Bluetooth radio and a battery. They will be controlled through an iPhone app. Source: NYT Neuralink/MetaLab iPhone app The goal is to drill four 8mm holes into paralyzed patients’ skulls and insert implants that will give them the ability to control computers and smartphones using their thoughts. For the first product, they are focusing on giving patients the ability to control their mobile device, and then redirect the output from their phone to a keyboard or a mouse. The company will seek U.S. Food and Drug Administration approval and is aspiring to target first-in-human clinical study by 2020. They will use it for treating upper cervical spinal cord injury. They’re expecting those patients to get four 1024 channel sensors, one each in the primary motor cortex, supplementary motor area, premotor cortex and closed-loop feedback into the primary somatosensory cortex. As reported by Bloomberg who got a pre-media briefing, Neuralink said it has performed at least 19 surgeries on animals with its robots and successfully placed the wires, which it calls “threads,” about 87% of the time. They used a lab rat and implanted a USB-C port in its head. A wire attached to the port transmitted its thoughts to a nearby computer where a software recorded and analyzed its brain activity, measuring the strength of brain spikes. The amount of data being gathered from a lab rat was about 10 times greater than what today’s most powerful sensors can collect. The flexibility of the Neuralink threads would be an advance, said Terry Sejnowski, the Francis Crick Professor at the Salk Institute for Biological Studies, in La Jolla, Calif to the New York Times. However, he noted that the Neuralink researchers still needed to prove that the insulation of their threads could survive for long periods in a brain’s environment, which has a salt solution that deteriorates many plastics. Musk's bizarre attempts to revolutionalize the world are far from reality Elon Musk is known for his dramatic promises and showmanship as much as he is for his eccentric projects. But how far they are grounded in reality is another thing. In May he successfully launched his mammoth space mission, Starlink sending 60 communications satellites to the orbit which will eventually be part of a single constellation providing high-speed internet to the globe. However, the satellites were launched after postponing it two times to “update satellite software”. Not just that,  three of the 60 satellites have lost contact with ground control teams, a SpaceX spokesperson said on June 28. Experts are already worried about how the Starlink constellation will contribute to the space debris problem. Currently, there are 2,000 operational satellites in orbit around Earth, according to the latest figures from the European Space Agency, and the completed Starlink constellation will drastically add to that number. Observers had also noticed some Starlink satellites had not initiated orbit raising after being released. Musk’s much-anticipated Hyperloop (first publicly mentioned in 2012) was supposed to shuttle passengers at near-supersonic speeds via pods traveling in a long, underground tunnel. But it was soon reduced to a car in a very small tunnel. When they unveiled the underground tunnel to the media in California last year in December, reporters climbed into electric cars made by Musk’s Tesla and were treated to a 40 mph ride along a bumpy path. Here as well there have been public concerns regarding its impact on public infrastructure and the environment. The biggest questions surrounding hyperloop’s environmental impact are its effect on carbon dioxide emissions, the effect of infrastructure on ecosystems, and the environmental footprint of the materials used to build it. Other concerns include noise pollution and how to repurpose hyperloop tubes and tunnels at the end of their lifespan. Researchers from Tencent Keen Security Lab criticized Tesla’s self-driving car software, publishing a report detailing their successful attacks on Tesla firmware. It includes remote control over the steering and an adversarial example attack on the autopilot that confuses the car into driving into oncoming traffic lane. Musk had also made promises to have a fully self-driving car for Tesla by 2020 which caused a lot of activity in the stock markets. But most are skeptical about this claim as well. Whether Elon Musk’s AI symbiotic visions will come in existence in the foreseeable future is questionable. Neuralink's long-term goals are characteristically unrealistic, considering not much is known about the human brain; cognitive functions and their representation as brain signals are still an area where much further research is required. While Musk’s projects are known for their technical excellence, History shows a lack of thought into the broader consequences and cost of such innovations such as the ethical concerns, environmental and societal impacts. Neuralink’s implant is also prone to invading one’s privacy as it will be storing sensitive medical information of a patient. There is also the likelihood of it violating one’s constitutional rights such as freedom of speech, expression among others. What does it mean to live in a world where one’s thoughts are constantly monitored and not truly one’s own? Then, because this is an implant what if the electrodes malfunction and send wrong signals to the brain. Who will be accountable in such scenarios? Although the FDA will be probing into such questions, these are some questions any responsible company should ask of itself proactively while developing life-altering products or services. These are equally important aspects that are worthy of stage time in a product launch. Regardless, Musk’s bold claims and dramatic representations are sure to gain the attention of investors and enthusiasts for now. Elon Musk reveals big plans with Neuralink SpaceX shares new information on Starlink after the successful launch of 60 satellites What Elon Musk can teach us about Futurism & Technology Forecasting
Read more
  • 0
  • 0
  • 14693

article-image-implementing-data-modeling-techniques-in-qlik-sense-tutorial
Bhagyashree R
17 Jul 2019
14 min read
Save for later

Implementing Data Modeling techniques in Qlik Sense [Tutorial]

Bhagyashree R
17 Jul 2019
14 min read
Data modeling is a conceptual process, representing the associations between the data in a manner in which it caters to specific business requirements. In this process, the various data tables are linked as per the business rules to achieve business needs. This article is taken from the book Hands-On Business Intelligence with Qlik Sense by Kaushik Solanki, Pablo Labbe, Clever Anjos, and Jerry DiMaso. By the end of this book, you will be well-equipped to run successful business intelligence applications using Qlik Sense's functionality, data modeling techniques, and visualization best practices. To follow along with the examples implemented in this article, you can download the code from the book’s GitHub repository. In this article, we will look at the basic concept of data modeling, its various types, and learn which technique is best suited for Qlik Sense dashboards. We will also learn about the methods for linking data with each other using joins and concatenation. Technical requirements For this article, we will use the app created earlier in the book, as a starting point with a loaded data model. You can find it in the book's GitHub repository. You can also download the initial and final version of the application from the repository. After downloading the initial version of the application, perform the following steps: If you are using Qlik Sense Desktop, place the app in the Qlik\Sense\Apps folder under your Documents personal folder If you are using Qlik Sense Cloud, upload the app to your personal workspace Advantages of data modeling Data modeling helps business in many ways. Let's look at some of the advantages of data modeling: High-speed retrieval: Data modeling helps to get the required information much faster than expected. This is because the data is interlinked between the different tables using the relationship. Provides ease of accessing data: Data modeling eases the process of giving the right access to the data to the end-users. With the simple data query language, you can get the required data easily. Helps in handling multiple relations: Various datasets have various kinds of relationship between the other data. For example, there could be one-to-one, or one-to-many, or many-to-many relationships. Data modeling helps in handling this kind of relationship easily. Stability: Data modeling provides stability to the system. Data modeling techniques There are various techniques in which data models can be built, each technique has its own advantages and disadvantages. The following are two widely-used data modeling techniques. Entity-relationship modeling The entity-relationship modeling (ER modeling) technique uses the entity and relationships to create a logical data model.  This technique is best suited for the Online Transaction Processing (OLTP) systems. An entity in this model refers to anything or object in the real world that has distinguishable characteristics. While a relationship in this model is the relationship between the two or more entities. There are three basic types of relationship that can exist: One-to-one: This relation means each value from one entity has a single relation with a value from the other entity. For example, one customer is handled by one sales representative: One-to-many: This relation means each value from one entity has multiple relations with values from other entities. For example, one sales representative handles multiple customers: Many-to-many: This relation means all values from both entities have multiple relations with each other. For example, one book can have many authors and each author can have multiple books: Dimensional modeling The dimensional modeling technique uses facts and dimensions to build the data model. This modeling technique was developed by Ralf Kimball. Unlike ER modeling, which uses normalization to build the model, this technique uses the denormalization of data to build the model. Facts, in this context, are tables that store the most granular transactional details. They mainly store the performance measurement metrics, which are the outcome of the business process. Fact tables are huge in size, because they store the transactional records. For example, let's say that sales data is captured at a retail store. The fact table for such data would look like the following: A fact table has the following characteristics: It contains the measures, which are mostly numeric in nature It stores the foreign key, which refers to the dimension tables It stores large numbers of records Mostly, it does not contain descriptive data The dimension table stores the descriptive data, describing the who, what, which, when, how, where, and why associated with the transaction. It has the maximum number of columns, but the records are generally fewer than fact tables. Dimension tables are also referred to as companions of the fact table. They store textual, and sometimes numerical, values. For example, a PIN code is numeric in nature, but they are not the measures and thus they get stored in the dimension table. In the previous sales example that we discussed, the customer, product, time, and salesperson are the dimension tables. The following diagram shows a sample dimension table: The following are the characteristics of the dimension table: It stores descriptive data, which describes the attributes of the transaction It contains many columns and fewer records compared to the fact table It also contains numeric data, which is descriptive in nature There are two types of dimensional modeling techniques that are widely used: Star schema: This schema model has one fact table that is linked with multiple dimension tables. The name star is given because once the model is ready, it looks like a star. The advantages of the star schema model include the following: Better query performance Simple to understand The following diagram shows an example of the star schema model: Snowflake schema: This schema model is similar to the star schema, but in this model, the dimensional tables are normalized further. The advantages of the snowflake schema model include the following: It provides better referential integrity It requires less space as data is normalized The following diagram shows an example of the snowflake schema model: When it comes to data modeling in Qlik Sense, the best option is to use the star schema model for better performance. Qlik Sense works very well when the data is loaded in a denormalized form, thus the star schema is suitable for Qlik Sense development. The following diagram shows the performance impact of different data models on Qlik Sense: Now that we know what data modeling is and which technique is most appropriate for Qlik Sense data modeling, let's look at some other fundamentals of handling data. Joining While working on data model building, we often encounter a situation where we want to have some fields added from one table into another to do some sort of calculations. In such situations, we use the option of joining those tables based on the common fields between them. Let's understand how we can use joins between tables with a simple example. Assume you want to calculate the selling price of a product. The information you have is SalesQty in Sales Table and UnitPrice of product in Product Table. The calculation for getting the sales price is UnitPrice * SalesQty. Now, let's see what output we get when we apply a join on these tables: Types of joins There are various kinds of joins available but let's take a look at the various types of joins supported by Qlik Sense. Let's consider the following tables to understand each type better: Order table: This table stores the order-related data: OrderNumber Product CustomerID OrderValue 100 Fruits 1 100 101 Fruits 2 80 102 Fruits 3 120 103 Vegetables 6 200 Customer table: This table stores the customer details, which include the CustomerID and Name: CustomerID Name 1 Alex 2 Linda 3 Sam 4 Michael 5 Sara Join/outer join When you want to get the data from both the tables you use the Join keyword. When you just use only Join between two tables, it is always a full outer join. The Outer keyword is optional. The following diagram shows the Venn diagram for the outer join: Now, let's see how we script this joining condition in Qlik Sense: Create a new Qlik Sense application. Give it a name of your choice. Jump to Script editor, create a new tab, and rename it as Outer Join, as shown in the following screenshot. Write the script shown in the following screenshot: Once you write the script, click on Load Data to run the script and load the data. Once the data is loaded, create a new sheet and add the Table object to see the joined table data, as shown in the following screenshot: As the output of Outer Join, we got five fields, as shown in the preceding screenshot. You can also observe that the last two rows have null values for the fields, which come from the Order table, where the customers 4 and 5 are not present. Left join When you want to extract all the records from the left table and matching records from the right table, then you use the Left Join keyword to join those two tables. The following diagram shows the Venn diagram for left join: Let's see the script for left join: In the previous application created, delete the Outer Join tab. Create a new tab and rename it as Left Join, as shown in the following screenshot. Write the script shown in the following screenshot: Once the script is written, click on Load Data to run the script and load the data. Once the script is finished, create a new sheet and add the Table object to see the joined table data, as shown in the following screenshot: Right join When you want to extract all the records from the right table and the matching records from the left table, then you use the right join keyword to join those two tables. The following diagram shows the Venn diagram for right join: Let's see the script for right join: In the previous application created, comment the existing script. Create a new tab and rename it as Right Join, as shown in the following screenshot. Write the script, as shown in the following screenshot: Once the script is written, click on Load Data to run the script and load the data. Once the script is finished, create a new sheet and add the Table object to see the joined table data, as shown in the following screenshot: Inner join When you want to extract matching records from both the tables, you use the Inner Join keyword to join those two tables. The following diagram shows the Venn diagram for inner join: Let's see the script for inner join: In the previous application created, comment the existing script. Create a new tab and rename it as Inner Join, as shown in the following screenshot. Write the script shown in following screenshot: Once the script is written, click on Load Data to run the script and load the data. Once the script is finished, create a new sheet and add the Table object to see the joined table data, as shown in the following screenshot: Concatenation Sometimes you come across a situation while building the data model where you may have to append one table below another. In such situations, you can use the concatenate function. Concatenating, as the name suggests, helps to add the records of one table below another. Concatenate is different from joins. Unlike joins, concatenate does not merge the matching records of both the tables in a single row. Automatic concatenation When the number of columns and their naming is same in two tables, Qlik Sense, by default, concatenates those tables without any explicit command. This is called the automatic concatenation. For example, you may get the customer information from two different sources, but with the same columns names. In such a case, automatic concatenation will be done by Qlik, as is shown in the following screenshot: You can see in the preceding screenshot that both the Source1 and Source2 tables have two columns with same names (note that names in Qlik Sense are case-sensitive). Thus, they are auto concatenated. One more thing to note here is that, in such a situation, Qlik Sense ignores the name given to the second table and stores all the data under the name given to the first table. The output table after concatenation is shown in the following screenshot: Forced concatenation There will be some cases in which you would like to concatenate two tables irrespective of the number of columns and name. In such a case, you should use the keyword Concatenate between two Load statements to concatenate those two tables. This is called the forced concatenation. For example, if you have sales and budget data at similar granularity, then you should use the Concatenate keyword to forcefully concatenate both tables, as shown in the following screenshot: The output table after loading this script will have data for common columns, one below the other. For the columns that are not same, there will be null values in those columns for the table in which they didn't exist. This is shown in the following output: You can see in the preceding screenshot that the SalesAmount is null for the budget data, and Budget is null for the sales data. The NoConcatenate In some situations when even though the columns and their name from the two tables are the same, you may want to treat them differently and don’t want to concatenate them. So Qlik Sense provides the NoConcatenate keyword, which helps to prevent automatic concatenation. Let's see how to write the script for NoConcatenate: You should handle the tables properly; otherwise, the output of NoConcatenate may create a synthetic table. Filtering In this section, we will learn how to filter the data while loading in Qlik Sense. As you know, there are two ways in which we can load the data in Qlik Sense: either by using the Data manager or the script editor. Let's see how to filter data with each of these options. Filtering data using the Data manager When you load data using the Data manager, you get an option named Filters at the top-right corner of the window, as shown in the following screenshot: This filter option enables us to set the filtering condition, which loads only the data that satisfies the condition given. The filter option allows the following conditions: = >  >= <  <= Using the preceding conditions, you can filter the text or numeric values of a field. For example, you can set a condition such as Date >= '01/01/2012' or ProductID = 80. The following screenshot shows such conditions applied in the Data load editor: Filtering data in the script editor If you are familiar with the Load statement or the SQL Select statement, it will be easy for you to filter the data while loading it. In the script editor, the best way to restrict the data is to include the Where clause at the end of the Load or Select statement; for example, Where Date >= '01/01/2012'. When you use the Where clause with the Load statement, you can use the following conditions: = > >= <  <= When you write the Where clause with the SQL Select statement, you can use the following conditions: = >  >= <  <= In Between Like Is Null Is Not Null The following screenshot shows an example of both the statements: This article walked you through various data modeling techniques. We also saw different types of joins and how we can implement them in Qlik Sense.  Then, we learned about concatenation and the scenarios in which we should use the concatenation option. We also looked at automatic concatenation, forced concatenation, and NoConcatenation. Further, we learned about the ways in which data can be filtered while loading in Qlik Sense. If you found this post useful, do check out the book, Hands-On Business Intelligence with Qlik Sense. This book teaches you how to create dynamic dashboards to bring interactive data visualization to your enterprise using Qlik Sense. 5 ways to create a connection to the Qlik Engine [Tip] What we learned from Qlik Qonnections 2018 Why AWS is the preferred cloud platform for developers working with big data
Read more
  • 0
  • 0
  • 22480

article-image-how-to-manage-complex-applications-using-kubernetes-based-helm-tool-tutorial
Savia Lobo
16 Jul 2019
16 min read
Save for later

How to manage complex applications using Kubernetes-based Helm tool [Tutorial]

Savia Lobo
16 Jul 2019
16 min read
Helm is a popular tool in the Kubernetes ecosystem that gives us a way of building packages (known as charts) of related Kubernetes objects that can be deployed in a cohesive way to a cluster. It also allows us to parameterize these packages, so they can be reused in different contexts and deployed to the varying environments that the services they provide might be needed in. This article is an excerpt taken from the book Kubernetes on AWS written by Ed Robinson. In this book, you will discover how to utilize the power of Kubernetes to manage and update your applications. In this article, you will learn how to manage complex applications using Kubernetes-based Helm tool. You will start by learning how to install Helm and later on how to configure and package Helm charts. Like Kubernetes, development of Helm is overseen by the Cloud Native Computing Foundation. As well as Helm (the package manager), the community maintains a repository of standard charts for a wide range of open source software you can install and run on your cluster. From the Jenkins CI server to MySQL or Prometheus, it's simple to install and run complex deployments involving many underlying Kubernetes resources with Helm. Installing Helm If you have already set up your own Kubernetes cluster and have correctly configured kubectl on your machine, then it is simple to install Helm. On macOS On macOS, the simplest way to install the Helm client is with Homebrew: $ brew install kubernetes-helm On Linux and Windows Every release of Helm includes prebuilt binaries for Linux, Windows, and macOS. Visit https://github.com/kubernetes/helm/releases to download the version you need for your platform. To install the client, simply unpack and copy the binary onto your path. For example, on a Linux machine you might do the following: $ tar -zxvf helm-v2.7.2-linux-amd64.tar.gz $ mv linux-amd64/helm /usr/local/bin/helm Installing Tiller Once you have the Helm CLI tool installed on your machine, you can go about installing Helm's server-side component, Tiller. Helm uses the same configuration as kubectl, so start by checking which context you will be installing Tiller onto: $ kubectl config current-context minikube Here, we will be installing Tiller into the cluster referenced by the Minikube context. In this case, this is exactly what we want. If your kubectl is not currently pointing to another cluster, you can quickly switch to the context you want to use like this: $ kubectl config use-context minikube If you are still not sure that you are using the correct context, take a quick look at the full config and check that the cluster server field is correct: $ kubectl config view --minify=true The minify flag removes any config not referenced by the current context. Once you are happy that the cluster that kubectl is connecting to is the correct one, we can set up Helm's local environment and install Tiller on to your cluster: $ helm init $HELM_HOME has been configured at /Users/edwardrobinson/.helm. Tiller (the Helm server-side component) has been installed into your Kubernetes Cluster. Happy Helming! We can use kubectl to check that Tiller is indeed running on our cluster: $ kubectl -n kube-system get deploy -l app=helm NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE tiller-deploy 1 1 1 1 3m Once we have verified that Tiller is correctly running on the cluster, let's use the version command. This will validate that we are able to connect correctly to the API of the Tiller server and return the version number of both the CLI and the Tiller server: $ helm version Client: &version.Version{SemVer:"v2.7.2", GitCommit:"8478fb4fc723885b155c924d1c8c410b7a9444e6", GitTreeState:"clean"} Server: &version.Version{SemVer:"v2.7.2", GitCommit:"8478fb4fc723885b155c924d1c8c410b7a9444e6", GitTreeState:"clean"} Installing a chart Let's start by installing an application by using one of the charts provided by the community. You can discover applications that the community has produced Helm charts for at https://hub.kubeapps.com/. As well as making it simple to deploy a wide range of applications to your Kubernetes cluster, it's a great resource for learning some of the best practices the community uses when packaging applications for Helm. Helm charts can be stored in a repository, so it is simple to install them by name. By default, Helm is already configured to use one remote repository called Stable. This makes it simple for us to try out some commonly used applications as soon as Helm is installed. Before you install a chart, you will need to know three things: The name of the chart you want to install The name you will give to this release (If you omit this, Helm will create a random name for this release) The namespace on the cluster you want to install the chart into (If you omit this, Helm will use the default namespace) Helm calls each distinct installation of a particular chart a release. Each release has a unique name that is used if you later want to update, upgrade, or even remove a release from your cluster. Being able to install multiple instances of a chart onto a single cluster makes Helm a little bit different from how we think about traditional package managers that are tied to a single machine, and typically only allow one installation of a particular package at once. But once you have got used to the terminology, it is very simple to understand: A chart is the package that contains all the information about how to install a particular application or tool to the cluster. You can think of it as a template that can be reused to create many different instances or releases of the packaged application or tool. A release is a named installation of a chart to a particular cluster. By referring to a release by name, Helm can make upgrades to a particular release, updating the version of the installed tool, or making configuration changes. A repository is an HTTP server storing charts along with an index file. When configured with the location of a repository, the Helm client can install a chart from that repository by downloading it and then making a new release. Before you can install a chart onto your cluster, you need to make sure that Helm knows about the repository that you want to use. You can list the repositories that are currently in use by running the helm repo list command: $ helm repo list NAME URL stable https://kubernetes-charts.storage.googleapis.com local http://127.0.0.1:8879/charts By default, Helm is configured with a repository named stable pointing at the community chart repository and local repository that points at a local address for testing your own local repository. (You need to be running helm serve for this.) Adding a Helm repository to this list is simple with the helm repo add command. You can add my Helm repository that contains some example applications related to this book by running the following command: $ helm repo add errm https://charts.errm.co.uk "errm" has been added to your repositories In order to pull the latest chart information from the configured repositories, you can run the following command: $ helm repo update Hang tight while we grab the latest from your chart repositories... ...Skip local chart repository ...Successfully got an update from the "errm" chart repository ...Successfully got an update from the "stable" chart repository Update Complete. Happy Helming! Let's start with one of the simplest applications available in my Helm repository, kubeslate. This provides some very basic information about your cluster, such as the version of Kubernetes you are running and the number of pods, deployments, and services in your cluster. We are going to start with this application, since it is very simple and doesn't require any special configuration to run on Minikube, or indeed any other cluster. Installing a chart from a repository on your cluster couldn't be simpler: $ helm install --name=my-slate errm/kubeslate You should see a lot of output from the helm command. Firstly, you will see some metadata about the release, such as its name, status, and namespace: NAME: my-slate LAST DEPLOYED: Mon Mar 26 21:55:39 2018 NAMESPACE: default STATUS: DEPLOYED Next, you should see some information about the resources that Helm has instructed Kubernetes to create on the cluster. As you can see, a single service and a single deployment have been created: RESOURCES: ==> v1/Service NAME TYPE CLUSTER-IP PORT(S) AGE my-slate-kubeslate ClusterIP 10.100.209.48 80/TCP 0s ==> v1/Deployment NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE my-slate-kubeslate 2 0 0 0 0s ==> v1/Pod(related) NAME READY STATUS AGE my-slate-kubeslate-77bd7479cf-gckf8 0/1 ContainerCreating 0s my-slate-kubeslate-77bd7479cf-vvlnz 0/1 ContainerCreating 0s Finally, there is a section with some notes that have been provided by the chart's author to give us some information about how to start using the application: Notes: To access kubeslate. First start the kubectl proxy: kubectl proxy Now open the following URL in your browser: http://localhost:8001/api/v1/namespaces/default/services/my-slate-kubeslate:http/proxy Please try reloading the page if you see ServiceUnavailable / no endpoints available for service, as pod creation might take a few moments. Try following these instructions yourself and open Kubeslate in your browser: Kubeslate deployed with Helm Configuring a chart When you use Helm to make a release of a chart, there are certain attributes that you might need to change or configuration you might need to provide. Luckily, Helm provides a standard way for users of a chart to override some or all of the configuration values. In this section, we are going to look at how, as the user of a chart, you might go about supplying configuration to Helm. Later in the chapter, we are going to look at how you can create your own charts and use the configuration passed in to allow your chart to be customized. When we invoke helm install, there are two ways we can provide configuration values: passing them as command-line arguments, or by providing a configuration file. These configuration values are merged with the default values provided by a chart. This allows a chart author to provide a default configuration to allow users to get up and running quickly, but still allow users to tweak important settings, or enable advanced features. Providing a single value to Helm on the command line is achieved by using the set flag. The kubeslate chart allows us to specify additional labels for the pod(s) that it launches using the podLabels variable. Let's make a new release of the kubeslate chart, and then use the podLabels variable to add an additional hello label with the value world: $ helm install --name labeled-slate --set podLabels.hello=world errm/kubeslate Once you have run this command, you should be able to prove that the extra variable you passed to Helm did indeed result in the pods launched by Helm having the correct label. Using the kubectl get pods command with a label selector for the label we applied using Helm should return the pods that have just been launched with Helm: $ kubectl get pods -l hello=world NAME READY STATUS labeled-slate-kubeslate-5b75b58cb-7jpfk 1/1 Running labeled-slate-kubeslate-5b75b58cb-hcpgj 1/1 Running As well as being able to pass a configuration to Helm when we create a new release, it is also possible to update the configuration in a pre-existing release using the upgrade command. When we use Helm to update a configuration, the process is much the same as when we updated deployment resources in the last chapter, and a lot of those considerations still apply if we want to avoid downtime in our services. For example, by launching multiple replicas of a service, we can avoid downtime, as a new version of a deployment configuration is rolled out. Let's also upgrade our original kubeslate release to include the same hello: world pod label that we applied to the second release. As you can see, the structure of the upgrade command is quite similar to the install command. But rather than specifying the name of the release with the --name flag, we pass it as the first argument. This is because when we install a chart to the cluster, the name of the release is optional. If we omit it, Helm will create a random name for the release. However, when performing an upgrade, we need to target a pre-existing release to upgrade, and thus this argument is mandatory: $ helm upgrade my-slate --set podLabels.hello=world errm/kubeslate If you now run helm ls, you should see that the release named my-slate has been upgraded to Revision 2. You can test that the deployment managed by this release has been upgraded to include this pod label by repeating our kubectl get command: $ kubectl get pods -l hello=world NAME READY STATUS labeled-slate-kubeslate-5b75b58cb-7jpfk 1/1 Running labeled-slate-kubeslate-5b75b58cb-hcpgj 1/1 Running my-slate-kubeslate-5c8c4bc77-4g4l4 1/1 Running my-slate-kubeslate-5c8c4bc77-7pdtf 1/1 Running We can now see that four pods, two from each of our releases, now match the label selector we passed to kubectl get. Passing variables on the command line with the set flag is convenient when we just want to provide values for a few variables. But when we want to pass more complex configurations, it can be simpler to provide the values as a file. Let's prepare a configuration file to apply several labels to our kubeslate pods: values.yml podLabels: hello: world access: internal users: admin We can then use the helm command to apply this configuration file to our release: $ helm upgrade labeled-slate -f values.yml errm/kubeslate To learn how to create your own charts, head over to our book. Packaging Helm charts While we are developing our chart, it is simple to use the Helm CLI to deploy our chart straight from the local filesystem. However, Helm also allows you to create your own repository in order to share your charts. A Helm repository is a collection of packaged Helm charts, plus an index stored in a particular directory structure on a standard HTTP web server. Once you are happy with your chart, you will want to package it so it is ready to distribute in a Helm repository. This is simple to do with the helm package command. When you start to distribute your charts with a repository, versioning becomes important. The version number of a chart in a Helm repository needs to follow the SemVer 2 guidelines. In order to build a packaged chart, start by checking that you have set an appropriate version number in Chart.yaml. If this is the first time you have packaged your chart, the default will be OK: $ helm package version-app Successfully packaged chart and saved it to: ~/helm-charts/version-app-0.1.0.tgz You can test a packaged chart without uploading it to a repository by using the helm serve command. This command will serve all of the packaged charts found in the current directory and generate an index on the fly: $ helm serve Regenerating index. This may take a moment. Now serving you on 127.0.0.1:8879 You can now try installing your chart by using the local repository: $ helm install local/version-app You can test building an index An Helm repository is just a collection of packaged charts stored in a directory. In order to discover and search the charts and versions available in a particular repository, the Helm client downloads a special index.yaml that includes metadata about each packaged chart and the location it can be downloaded from. In order to generate this index file, we need to copy all the packaged charts that we want in our index to the same directory: cp ~/helm-charts/version-app-0.1.0.tgz ~/helm-repo/ Then, in order to generate the index.yaml file, we use the helm repo index command. You will need to pass the root URL where the packaged charts will be served from. This could be the address of a web server, or on AWS, you might use a S3 bucket: helm repo index ~/helm-repo --url https://helm-repo.example.org The chart index is quite a simple format, listing the name of each chart available, and then providing a list of each version available for each named chart. The index also includes a checksum in order to validate the download of charts from the repository: apiVersion: v1 entries: version-app: - apiVersion: v1 created: 2018-01-10T19:28:27.802896842Z description: A Helm chart for Kubernetes digest: 79aee8b48cab65f0d3693b98ae8234fe889b22815db87861e590276a657912c1 name: version-app urls: - https://helm-repo.example.org/version-app-0.1.0.tgz version: 0.1.0 generated: 2018-01-10T19:28:27.802428278Z The generated index.yaml file for our new chart repository. Once we have created the index.yaml file, it is simply a question of copying your packaged charts and the index file to the host you have chosen to use. If you are using S3, this might look like this: aws s3 sync ~/helm-repo s3://my-helm-repo-bucket In order for Helm to be able to use your repository, your web server (or S3) needs to be correctly configured. The web server needs to serve the index.yaml file with the correct content type header (text/yaml or text/x-yaml). The charts need to be available at the URLs listed in the index. Using your repository Once you have set up the repository, you can configure Helm to use it: helm repo add my-repo https://helm-repo.example.org my-repo has been added to your repositories When you add a repository, Helm validates that it can indeed connect to the URL given and download the index file. You can check this by searching for your chart by using helm search: $ helm search version-app NAME VERSION DESCRIPTION my-repo/version-app 0.1.1 A Helm chart for Kubernetes Thus, in this article you learned how to install Helm, configuring and packaging Helm charts.  It can be used for a wide range of scenarios where you want to deploy resources to a Kubernetes cluster, from providing a simple way for others to install an application you have written on their own clusters, to forming the cornerstone of an internal Platform as a Service within a larger organization. To know more about how to configure your own charts using Helm and to know the organizational patterns for Helm, head over to our book, Kubernetes on AWS. Elastic launches Helm Charts (alpha) for faster deployment of Elasticsearch and Kibana to Kubernetes Introducing ‘Quarkus’, a Kubernetes native Java framework for GraalVM & OpenJDK HotSpot Pivotal and Heroku team up to create Cloud Native Buildpacks for Kubernetes
Read more
  • 0
  • 0
  • 51279
article-image-linux-kernel-announces-a-patch-to-allow-0-0-0-0-8-as-a-valid-address-range
Savia Lobo
15 Jul 2019
6 min read
Save for later

Linux kernel announces a patch to allow 0.0.0.0/8 as a valid address range

Savia Lobo
15 Jul 2019
6 min read
Last month, the team behind Linux kernel announced a patch that allows 0.0.0.0/8 as a valid address range. This patch allows for these 16m new IPv4 addresses to appear within a box or on the wire. The aim is to use this 0/8 as a global unicast as this address was never used except the 0.0.0.0. In a post written by Dave Taht, Director of the Make-Wifi-Fast, and committed by David Stephen Miller, an American software developer working on the Linux kernel mentions that the use of 0.0.0.0/8 has been prohibited since the early internet due to two issues. First, an interoperability problem with BSD 4.2 in 1984, which was fixed in BSD 4.3 in 1986. “BSD 4.2 has long since been retired”, the post mentions. The second issue is that addresses of the form 0.x.y.z were initially defined only as a source address in an ICMP datagram, indicating "node number x.y.z on this IPv4 network", by nodes that know their address on their local network, but do not yet know their network prefix, in RFC0792 (page 19). The use of 0.x.y.z was later repealed in RFC1122 because the original ICMP-based mechanism for learning the network prefix was unworkable on many networks such as Ethernet. This is because these networks have longer addresses that would not fit into the 24 "node number" bits. Modern networks use reverse ARP (RFC0903) or BOOTP (RFC0951) or DHCP (RFC2131) to find their full 32-bit address and CIDR netmask (and other parameters such as default gateways). 0.x.y.z has had 16,777,215 addresses in 0.0.0.0/8 space left unused and reserved for future use, since 1989. The whole discussion of using allowing these IP address and making them available started early this year at the NetDevConf 2019, The Technical Conference on Linux Networking. The conference took place in Prague, Czech Republic, from March 20th to 22nd, 2019. One of the sessions, “Potential IPv4 Unicast Expansions”, conducted by  Dave Taht, along with John Gilmore, and Paul Wouters explains how IPv4 success story was in carrying unicast packets worldwide. The speakers say, service sites still need IPv4 addresses for everything, since the majority of Internet client nodes don't yet have IPv6 addresses. IPv4 addresses now cost 15 to 20 dollars apiece (times the size of your network!) and the price is rising. In their keynote, they described, the IPv4 address space includes hundreds of millions of addresses reserved for obscure (the ranges 0/8, and 127/16), or obsolete (225/8-231/8) reasons, or for "future use" (240/4 - otherwise known as class E). They highlighted the fact: “instead of leaving these IP addresses unused, we have started an effort to make them usable, generally. This work stalled out 10 years ago, because IPv6 was going to be universally deployed by now, and reliance on IPv4 was expected to be much lower than it in fact still is”. “We have been reporting bugs and sending patches to various vendors. For Linux, we have patches accepted in the kernel and patches pending for the distributions, routing daemons, and userland tools. Slowly but surely, we are decontaminating these IP addresses so they can be used in the near future. Many routers already handle many of these addresses, or can easily be configured to do so, and so we are working to expand unicast treatment of these addresses in routers and other OSes”, they further mentioned. They said they wanted to carry out an “authorized experiment to route some of these addresses globally, monitor their reachability from different parts of the Internet, and talk to ISPs who are not yet treating them as unicast to update their networks”. Here’s the patch code for 0.0.0.0/8 for Linux: Users have a mixed reaction to this announcement and assumed that these addresses would be unassigned forever. A few are of the opinion that for most business, IPv6 is an unnecessary headache. A user explained the difference between the address ranges in a reply to Jeremy Stretch’s (a network engineer) post, “0.0.0.0/8 - Addresses in this block refer to source hosts on "this" network. Address 0.0.0.0/32 may be used as a source address for this host on this network; other addresses within 0.0.0.0/8 may be used to refer to specified hosts on this network [RFC1700, page 4].” A user on Reddit writes, this announcement will probably get “the same reaction when 1.1.1.1 and 1.0.0.1 became available, and AT&T blocked it 'by accident' or most equipment vendors or major ISP will use 0.0.0.0/8 as a loopback interface or test interface because they never thought it would be assigned to anyone.” Another user on Elegant treader writes, “I could actually see us successfully inventing, and implementing, a multiverse concept for ipv4 to make these 32 bit addresses last another 40 years, as opposed to throwing these non-upgradable, hardcoded v4 devices out”. Another writes, if they would have “taken IPv4 and added more bits - we might all be using IPv6 now”. The user further mentions, “Instead they used the opportunity to cram every feature but the kitchen sink in there, so none of the hardware vendors were interested in implementing it and the backbones were slow to adopt it. So we got mass adoption of NAT instead of mass adoption of IPv6”. A user explains, “A single /8 isn’t going to meaningfully impact the exhaustion issues IPv4 faces. I believe it was APNIC a couple of years ago who said they were already facing allocation requests equivalent to an /8 a month”. “It’s part of the reason hand-wringing over some of the “wasteful” /8s that were handed out to organizations in the early days is largely pointless. Even if you could get those orgs to consolidate and give back large useable ranges in those blocks, there’s simply not enough there to meaningfully change the long term mismatch between demand and supply”, the user further adds. To know about these developments in detail, watch Dave Taht’s keynote video on YouTube: https://www.youtube.com/watch?v=92aNK3ftz6M&feature=youtu.be An attack on SKS Keyserver Network, a write-only program, poisons two high-profile OpenPGP certificates Former npm CTO introduces Entropic, a federated package registry with a new CLI and much more! Amazon adds UDP load balancing support for Network Load Balancer
Read more
  • 0
  • 0
  • 36604

article-image-amazons-partnership-with-nhs-to-make-alexa-offer-medical-advice-raises-privacy-concerns-and-public-backlash
Bhagyashree R
12 Jul 2019
6 min read
Save for later

Amazon’s partnership with NHS to make Alexa offer medical advice raises privacy concerns and public backlash

Bhagyashree R
12 Jul 2019
6 min read
Virtual assistants like Alexa and smart speakers are being increasingly used in today’s time because of the convenience they come packaged with. It is good to have someone play a song or restock your groceries just on your one command, or probably more than one command. You get the point! But, how comfortable will you be if these assistants can provide you some medical advice? Amazon has teamed up with UK’s National Health Service (NHS) to make Alexa your new medical consultant. The voice-enabled digital assistant will now answer your health-related queries by looking through the NHS website vetted by professional doctors. https://twitter.com/NHSX/status/1148890337504583680 The NHSX initiative to drive digital innovation in healthcare Voice search definitely gives us the most “humanized” way of finding information from the web. One of the striking advantages of voice-enabled digital assistants is that the elderly, the blind and those who are unable to access the internet in other ways can also benefit from them. UK’s health secretary, Matt Hancock, believes that “embracing” such technologies will not only reduce the pressure General Practitioners (GPs) and pharmacists face but will also encourage people to take better control of their health care. He adds, "We want to empower every patient to take better control of their healthcare." Partnering with Amazon is just one of many steps by NHS to adopt technology for healthcare. The NHS launched a full-fledged unit named NHSX (where X stands for User Experience) last week. Its mission is to provide staff and citizens “the technology they need” with an annual investment of more than $1 billion a year. This partnership was announced last year and NHS plans to partner with other companies such as Microsoft in the future to achieve its goal of “modernizing health services.” Can we consider Alexa’s advice safe Voice assistants are very fun and convenient to use, but only when they are actually working. Many a time it happens that the assistant fails to understand something and we have to yell the command again and again, which makes the experience outright frustrating. Furthermore, the track record of consulting the web to diagnose our symptoms has not been the most accurate one. Many Twitter users trolled this decision saying that Alexa is not yet capable of doing simple tasks like playing a song accurately and the NHS budget could have been instead used on additional NHS staff, lowering drug prices, and many other facilities. The public was also left sore because the government has given Amazon a new means to make a profit, instead of forcing them to pay taxes. Others also talked about the times when Google (mis)-diagnosed their symptoms. https://twitter.com/NHSMillion/status/1148883285952610304 https://twitter.com/doctor_oxford/status/1148857265946079232 https://twitter.com/TechnicallyRon/status/1148862592254906370 https://twitter.com/withorpe/status/1148886063290540032 AI ethicists and experts raise data privacy issues Amazon has been involved in several controversies around privacy concerns regarding Alexa. Earlier this month, it admitted that a few voice recordings made by Alexa are never deleted from the company's server, even when the user manually deletes them. Another news in April this year revealed that when you speak to an Echo smart speaker, not only does Alexa but potentially Amazon employees also listen to your requests. Last month, two lawsuits were filed in Seattle stating that Amazon is recording voiceprints of children using its Alexa devices without their consent. Last year, an Amazon Echo user in Portland, Oregon was shocked when she learned that her Echo device recorded a conversation with her husband and sent the audio file to one of his employees in Seattle. Amazon confirmed that this was an error because of which the device’s microphone misheard a series of words. Another creepy, yet funny incident was when Alexa users started hearing an unprompted laugh from their smart speaker devices. Alexa laughed randomly when the device was not even being used. https://twitter.com/CaptHandlebar/status/966838302224666624 Big tech including Amazon, Google, and Facebook constantly try to reassure their users that their data is safe and they have appropriate privacy measures in place. But, these promises are hard to believe when there is so many news of data breaches involving these companies. Last year, a German computer magazine c’t reported that a user received 1,700 Alexa voice recordings from Amazon when he asked for copies of the personal data Amazon has about him. Many experts also raised their concerns about using Alexa for giving medical advice. A Berlin-based tech expert Manthana Stender calls this move a “corporate capture of public institutions”. https://twitter.com/StenderWorld/status/1148893625914404864 Dr. David Wrigley, a British medical doctor who works as a general practitioner also asked how the voice recordings of people asking for health advice will be handled. https://twitter.com/DavidGWrigley/status/1148884541144219648 Director of Big Brother Watch, Silkie Carlo told BBC,  "Any public money spent on this awful plan rather than frontline services would be a breathtaking waste. Healthcare is made inaccessible when trust and privacy is stripped away, and that's what this terrible plan would do. It's a data protection disaster waiting to happen." Prof Helen Stokes-Lampard, of the Royal College of GPs, believes that the move has "potential", especially for minor ailments. She added that it is important individuals do independent research to ensure the advice given is safe or it could "prevent people from seeking proper medical help and create even more pressure". She further said that not everyone is comfortable using such technology or could afford it. Amazon promises that the data will be kept confidential and will not be used to build a profile on customers. A spokesman shared with The Times, "All data was encrypted and kept confidential. Customers are in control of their voice history and can review or delete recordings." Amazon is being sued for recording children’s voices through Alexa without consent Amazon Alexa is HIPAA-compliant: bigger leap in the health care sector Amazon is supporting research into conversational AI with Alexa fellowships
Read more
  • 0
  • 0
  • 32786

article-image-best-practices-for-restful-web-services-naming-conventions-and-api-versioning-tutorial
Sugandha Lahoti
12 Jul 2019
12 min read
Save for later

Best practices for RESTful web services : Naming conventions and API Versioning [Tutorial]

Sugandha Lahoti
12 Jul 2019
12 min read
This article covers two important best practices for REST and RESTful APIs: Naming conventions and API Versioning. This article is taken from the book Hands-On RESTful Web Services with TypeScript 3 by Biharck Muniz Araújo. This book will guide you in designing and developing RESTful web services with the power of TypeScript 3 and Node.js. What are naming conventions One of the keys to achieving a good RESTful design is naming the HTTP verbs appropriately. It is really important to create understandable resources that allow people to easily discover and use your services. A good resource name implies that the resource is intuitive and clear to use. On the other hand, the usage of HTTP methods that are incompatible with REST patterns creates noise and makes the developer's life harder. In this section, there will be some suggestions for creating clear and good resource URIs. It is good practice to expose resources as nouns instead of verbs. Essentially, a resource represents a thing, and that is the reason you should use nouns. Verbs refer to actions, which are used to factor HTTP actions. Three words that describe good resource naming conventions are as follows: Understandability: The resource's representation format should be understandable and utilizable by both the server and the client Completeness: A resource should be completely represented by the format Linkability: A resource can be linked to another resource Some example resources are as follows: Users of a system Blogs posts An article Disciplines in which a student is enrolled Students in which a professor teaches A blog post draft Each resource that's exposed by any service in a best-case scenario should be exposed by a unique URI that identifies it. It is quite common to see the same resource being exposed by more than one URI, which is definitely not good. It is also good practice to do this when the URI makes sense and describes the resource itself clearly. URIs need to be predictable, which means that they have to be consistent in terms of data structure. In general, this is not a REST required rule, but it enhances the service and/or the API. A good way to write good RESTful APIs is by writing them while having your consumers in mind. There is no reason to write an API and name it while thinking about the APIs developers rather than its consumers, who will be the people who are actually consuming your resources and API (as the name suggests). Even though the resource now has a good name, which means that it is easier to understand, it is still difficult to understand its boundaries. Imagine that services are not well named; bad naming creates a lot of chaos, such as business rule duplications, bad API usage, and so on. In addition to this, we will explain naming conventions based on a hypothetical scenario. Let's imagine that there is a company that manages orders, offers, products, items, customers, and so on. Considering everything that we've said about resources, if we decided to expose a customer resource and we want to insert a new customer, the URI might be as follows: POST https://<HOST>/customers The hypothetical request body might be as follows: { "fist-name" : "john", "last-name" : "doe", "e-mail" : "john.doe@email.com" } Imagine that the previous request will result in a customer ID of 445839 when it needs to recover the customer. The GET method could be called as follows: GET https://<HOST>/customers/445839 The response will look something like this: sample body response for customer #445839: { "customer-id": 445839, "fist-name" : "john", "last-name" : "doe", "e-mail" : "john.doe@email.com" } The same URI can be used for the PUT and DELETE operations, respectively: PUT https://<HOST>/customers/445839 The PUT body request might be as follows: { "last-name" : "lennon" } For the DELETE operation, the HTTP request to the URI will be as follows: DELETE https://<HOST>/customers/445839 Moving on, based on the naming conventions, the product URI might be as follows: POST https://<HOST>/products sample body request: { "name" : "notebook", "description" : "and fruit brand" } GET https://<HOST>/products/9384 PUT https://<HOST>/products/9384 sample body request: { "name" : "desktop" } DELETE https://<HOST>/products/9384 Now, the next step is to expose the URI for order creation. Before we continue, we should go over the various ways to expose the URI. The first option is to do the following: POST https://<HOST>/orders However, this could be outside the context of the desired customer. The order exists without a customer, which is quite odd. The second option is to expose the order inside a customer, like so: POST https://<HOST>/customers/445839/orders Based on that model, all orders belong to user 445839. If we want to retrieve those orders, we can make a GET request, like so: GET https://<HOST>/customers/445839/orders As we mentioned previously, it is also possible to write hierarchical concepts when there is a relationship between resources or entities. Following the same idea of orders, how should we represent the URI to describe items within an order and an order that belongs to user 445839? First, if we would like to get a specific order, such as order 7384, we can do that like so: GET https://<HOST>/customers/445839/orders/7384 Following the same approach, to get the items, we could use the following code: GET https://<HOST>/customers/445839/orders/7384/items The same concept applies to the create process, where the URI is still the same, but the HTTP method is POST instead of GET. In this scenario, the body also has to be sent: POST https://<HOST>/customers/445839/orders/7384 { "id" : 7834, "quantity" : 10 } Now, you should have a good idea of what the GET operation offers in regard to orders. The same approach can also be applied so that you can go deeper and get a specific item from a specific order and from a specific user: GET https://<HOST>/customers/445839/orders/7384/items/1 Of course, this hierarchy applies to the PUT, PATCH, and POST methods, and in some cases, the DELETE method as well. It will depend on your business rules; for example, can the item be deleted? Can I update an order? What is API versioning As APIs are being developed, gathering more business rules for their context on a day-to-day basis, generating tech debits and maturing, there often comes a point where teams need to release breaking functionality. It is also a challenge to keep their existing consumers working perfectly. One way to keep them working is by versioning APIs. Breaking changes can get messy. When something changes abruptly, it often generates issues for consumers, as this usually isn't planned and directly affects the ability to deliver new business experiences. There is a variant that says that APIs should be versionless. This means that building APIs that won't change their contract forces every change to be viewed through the lens of backward compatibility. This drives us to create better API interfaces, not only to solve any current issues, but to allow us to build APIs based on foundational capabilities or business capabilities themselves. Here are a few tips that should help you out: Put yourself in the consumer's shoes: When it comes to product perspective, it is suggested that you think from the consumer's point of view when building APIs. Most breaking changes happen because developers build APIs without considering the consumers, which means that they are building something for themselves and not for the real users' needs. Contract-first design: The API interface has to be treated as a formal contract, which is harder to change and more important than the coding behind it. The key to API design success is understanding the consumer's needs and the business associated with it to create a reliable contract. This is essentially a good, productive conversation between the consumers and the producers. Requires tolerant readers: It is quite common to add new fields to a contract with time. Based on what we have learned so far, this could generate a breaking change. This sometimes occurs because, unfortunately, many consumers utilize a deserializer strategy, which is strict by default. This means that, in general, the plugin that's used to deserialize throws exceptions on fields that have never been seen before. It is not recommended to version APIs, but only because you need to add a new optional field to the contract. However, in the same way, we don't want to break changes on the client side. Some good advice is documenting any changes, stating that new fields might be added so that the consumers aren't surprised by any new changes. Add an object wrapper: This sounds obvious, but when teams release APIs without object wrappers, the APIs turn on hard APIs, which means that they are near impossible to evolve without having to make breaking changes. For instance, let's say your team has delivered an API based on JSON that returns a raw JSON array. So far, so good. However, as they continue, they find out that they have to deal with paging, or have to internationalize the service or any other context change. There is no way of making changes without breaking something because the return is based on raw JSON. Always plan to version: Don't think you have built the best turbo API in the world ever. APIs are built with a final date, even though you don't know it yet. It's always a good plan to build APIs while taking versioning into consideration. Including the version in the URL Including the version in the URL is an easy strategy for having the version number added at the end of the URI. Let's see how this is done: https://api.domain.com/v1/ https://api.domain.com/v2/ https://api.domain.com/v3/ Basically, this model tells the consumers which API version they are using. Every breaking change increases the version number. One issue that may occur when the URI for a resource changes is that the resource may no longer be found with the old URI unless redirects are used. Versioning in the subdomain In regard to versioning in the URL, subdomain versioning puts the version within the URI but associated with the domain, like so: https://v1.api.domain.com/ https://v2.api.domain.com/ https://v3.api.domain.com/ This is quite similar to versioning at the end of the URI. One of the advantages of using a subdomain strategy is that your API can be hosted on different servers. Versioning on media types Another approach to versioning is using MIME types to include the API version. In short, API producers register these MIME types on their backend and then the consumers need to include accept and content-type headers. The following code lets you use an additional header: GET https://<HOST>/orders/1325 HTTP/1.1 Accept: application/json Version: 1 GET https://<HOST>/orders/1325 HTTP/1.1 Accept: application/json Version: 2 GET https://<HOST>/orders/1325 HTTP/1.1 Accept: application/json Version: 3 The following code lets you use an additional field in the accept/content-type header: GET https://<HOST>/orders/1325 HTTP/1.1 Accept: application/json; version=1 GET https://<HOST>/orders/1325 HTTP/1.1 Accept: application/json; version=2 GET https://<HOST>/orders/1325 HTTP/1.1 Accept: application/json; version=3 The following code lets you use a Media type: GET https://<HOST>/orders/1325 HTTP/1.1 Accept: application/vnd.<host>.orders.v1+json GET https://<HOST>/orders/1325 HTTP/1.1 Accept: application/vnd.<host>.orders.v2+json GET https://<HOST>/orders/1325 HTTP/1.1 Accept: application/vnd.<host>.orders.v3+json Recommendation When using a RESTful service, it is highly recommended that you use header-based versioning. However, the recommendation is to keep the version in the URL. This strategy allows the consumers to open the API in a browser, send it in an email, bookmark it, share it more easily, and so on. This format also enables human log readability. There are also a few more recommendations regarding API versioning: Use only the major version: API consumers should only care about breaking changes. Use a version number: Keep things clear; numbering the API incrementally allows the consumer to track evolvability. Versioning APIs using timestamps or any other format only creates confusion in the consumer's mind. This also exposes more information about versioning than is necessary. Require that the version has to be passed: Even though this is more convenient from the API producer's perspective, starting with a version is a good strategy because the consumers will know that the API version might change and they will be prepared for that. Document your API time-to-live policy: Good documentation is a good path to follow. Keeping everything well-described will mean that consumers avoid finding out that there is no Version 1 available anymore because it has been deprecated. Policies allow consumers to be prepared for issues such as depreciation. In this article, we learned about best practices related to RESTful web services such naming conventions, and API versioning formats. Next, to look at how to design RESTful web services with OpenAPI and Swagger, focusing on the core principles while creating web services, read our book Hands-On RESTful Web Services with TypeScript 3. 7 reasons to choose GraphQL APIs over REST for building your APIs Which Python framework is best for building RESTful APIs? Django or Flask? Understanding advanced patterns in RESTful API [Tutorial]
Read more
  • 0
  • 0
  • 44761
article-image-defining-rest-and-its-various-architectural-styles
Sugandha Lahoti
11 Jul 2019
9 min read
Save for later

Defining REST and its various architectural styles

Sugandha Lahoti
11 Jul 2019
9 min read
RESTful web services are services built according to REST principles. The idea is to have them designed to essentially work well on the web. But, what is REST? Let's start from the beginning by defining REST. This article is taken from the book Hands-On RESTful Web Services with TypeScript 3 by Biharck Muniz Araújo. This book is a  step-by-step guide that will help you design, develop, scale, and deploy RESTful APIs with TypeScript 3 and Node.js. In this article we will learn what is REST and talk about various REST architectural styles. What is REST? The REST (Representational State Transfer) style is a set of software engineering practices that contains constraints that should be used in order to create web services in distributed hypermedia systems. REST is not a tool and neither is it a language; in fact, REST is agnostic of protocols, components, and languages. It is important to say that REST is an architectural style and not a toolkit. REST provides a set of design rules in order to create stateless services that are shown as resources and, in some cases, sources of specific information such as data and functionality. The identification of each resource is performed by its unique Uniform Resource Identifier (URI). REST describes simple interfaces that transmit data over a standardized interface such as HTTP and HTTPS without any additional messaging layer, such as Simple Object Access Protocol (SOAP). The consumer will access REST resources via a URI using HTTP methods (this will be explained in more detail later). After the request, it is expected that a representation of the requested resource is returned. The representation of any resource is, in general, a document that reflects the current or intended state of the requested resource. REST architectural styles The REST architectural style describes six constraints. These constraints were originally described by Roy Fielding in his Ph.D. thesis. They include the following: Uniform interface Stateless Cacheable Client-server architecture A layered system Code on demand (optional) We will discuss them all minutely in the following subsections. Uniform interface Uniform interface is a constraint that describes a contract between clients and servers. One of the reasons to create an interface between them is to allow each part to evolve regardless of each other. Once there is a contract aligned with the client and server parts, they can start their works independently because, at the end of the day, the way that they will communicate is firmly based on the interface: The uniform interface is divided into four main groups, called principles: Resource-based The manipulation of resources using representations Self-descriptive messages Hypermedia as the Engine of Application State (HATEOAS) Let's talk more about them. Resource-based One of the key things when a resource is being modeled is the URI definition. The URI is what defines a resource as unique. This representation is what will be returned for clients. If you decided to perform GET to the offer URI, the resource that returns should be a resource representing an order containing the ID order, creation date, and so on. The representation should be in JSON or XML. Here is a JSON example: { id : 1234, creation-date : "1937-01-01T12:00:27.87+00:20", any-other-json-fields... } Here is an XML example: <order> <id>1234</id> <creation-date>1937-01-01T12:00:27.87+00:20</creation-date> any-other-xml-fields </order> The manipulation of resources using representations Following the happy path, when the client makes a request to the server, the server responds with a resource that represents the current state of its resource. This resource can be manipulated by the client. The client can request what kind it desires for the representation such as JSON, XML, or plain text. When the client needs to specify the representation, the HTTP Accept header is used. Here you can see an example in plain text: GET https://<HOST>/orders/12345 Accept: text/plain The next one is in JSON format: GET https://<HOST>/orders/12345 Accept: application/json Self-descriptive messages In general, the information provided by the RESTful service contains all the information about the resource that the client should be aware of. There is also a possibility of including more information than the resource itself. This information can be included as a link. In HTTP, it is used as the content-type header and the agreement needs to be bilateral—that is, the requestor needs to state the media type that it's waiting for and the receiver must agree about what the media type refers to. Some examples of media types are listed in the following table: Extension Document Type MIME type .aac AAC audio file audio/aac .arc Archive document application/octet-stream .avi Audio Video Interleave (AVI) video/x-msvideo .css Cascading Style Sheets (CSS) text/css .csv Comma-separated values (CSV) text/csv .doc Microsoft Word application/msword .epub Electronic publication (EPUB) application/epub+zip .gif Graphics Interchange Format (GIF) image/gif .html HyperText Markup Language (HTML) text/html .ico Icon format image/x-icon .ics iCalendar format text/calendar .jar Java Archive (JAR) application/java-archive .jpeg JPEG images image/jpeg .js JavaScript (ECMAScript) application/javascript .json JSON format application/json .mpeg MPEG video video/mpeg .mpkg Apple Installer Package application/vnd.apple.installer+xml .odt OpenDocument text document application/vnd.oasis.opendocument.text .oga OGG audio audio/ogg .ogv OGG video video/ogg .ogx OGG application/ogg .otf OpenType font font/otf .png Portable Network Graphics image/png .pdf Adobe Portable Document Format (PDF) application/pdf .ppt Microsoft PowerPoint application/vnd.ms-powerpoint .rar RAR archive application/x-rar-compressed .rtf Rich Text Format (RTF) application/rtf .sh Bourne shell script application/x-sh .svg Scalable Vector Graphics (SVG) image/svg+xml .tar Tape Archive (TAR) application/x-tar .ts TypeScript file application/typescript .ttf TrueType Font font/ttf .vsd Microsoft Visio application/vnd.visio .wav Waveform Audio Format audio/x-wav .zip ZIP archive application/zip .7z 7-zip archive application/x-7z-compressed There is also a possibility of creating custom media types. A complete list can be found here. HATEOAS HATEOAS is a way that the client can interact with the response by navigating within it through the hierarchy in order to get complementary information. For example, here the client makes a GET call to the order URI : GET https://<HOST>/orders/1234 The response comes with a navigation link to the items within the 1234 order, as in the following code block: { id : 1234, any-other-json-fields..., links": [ { "href": "1234/items", "rel": "items", "type" : "GET" } ] } What happens here is that the link fields allow the client to navigate until 1234/items in order to see all the items that belong to the 1234 order. Stateless Essentially, stateless means that the necessary state during the request is contained within the request and it is not persisted in any hypothesis that could be recovered further. Basically, the URI is the unique identifier to the destination and the body contains the state or changeable state, or the resource. In other words, after the server handles the request, the state could change and it will send back to the requestor with the appropriate HTTP status code: In comparison to the default session scope found in a lot of existing systems, the REST client must be the one that is responsible in providing all necessary information to the server, considering that the server should be idempotent. Stateless allows high scalability since the server will not maintain sessions. Another interesting point to note is that the load balancer does not care about sessions at all in stateless systems. In other words, the client needs to always pass the whole request in order to get the resource because the server is not allowed to hold any previous request state. Cacheable The aim of caching is to never have to generate the same response more than once. The key benefits of using this strategy are an increase in speed and a reduction in server processing. Essentially, the request flows through a cache or a series of caches, such as local caching, proxy caching, or reverse proxy caching, in front of the service hosting the resource. If any of them match with any criteria during the request (for example, the timestamp or client ID), the data is returned based on the cache layer, and if the caches cannot satisfy the request, the request goes to the server: Client-server architecture The REST style separates clients from a server. In short, whenever it is necessary to replace either the server or client side, things should flow naturally since there is no coupling between them. The client side should not care about data storage and the server side should not care about the interface at all: A layered system Each layer must work independently and interact only with the layers directly connected to it. This strategy allows passing the request without bypassing other layers. For instance, when scaling a service is desired, you might use a proxy working as a load balancer—that way, the incoming requests are deliverable to the appropriate server instance. That being the case, the client side does not need to understand how the server is going to work; it just makes requests to the same URI. The cache is another example that behaves in another layer, and the client does not need to understand how it works either: Code on demand In summary, this optional pattern allows the client to download and execute code from the server on the client side. The constraint says that this strategy improves scalability since the code can execute independently of the server on the client side: In this post, we discussed various REST architectural styles based on six constraints. To know more about best practices for RESTful design such as API endpoint organization, different ways to expose an API service, how to handle large datasets, check out the book Hands-On RESTful Web Services with TypeScript 3. 7 reasons to choose GraphQL APIs over REST for building your APIs Which Python framework is best for building RESTful APIs? Django or Flask? Understanding advanced patterns in RESTful API [Tutorial]
Read more
  • 0
  • 0
  • 16550

article-image-british-airways-set-to-face-a-record-breaking-fine-of-183m-by-the-ico-over-customer-data-breach
Sugandha Lahoti
08 Jul 2019
6 min read
Save for later

British Airways set to face a record-breaking fine of £183m by the ICO over customer data breach

Sugandha Lahoti
08 Jul 2019
6 min read
UK’s watchdog ICO is all set to fine British Airways more than £183m over a customer data breach. In September last year, British Airways notified ICO about a data breach that compromised personal identification information of over 500,000 customers and is believed to have begun in June 2018. ICO said in a statement, “Following an extensive investigation, the ICO has issued a notice of its intention to fine British Airways £183.39M for infringements of the General Data Protection Regulation (GDPR).” Information Commissioner Elizabeth Denham said, "People's personal data is just that - personal. When an organisation fails to protect it from loss, damage or theft, it is more than an inconvenience. That's why the law is clear - when you are entrusted with personal data, you must look after it. Those that don't will face scrutiny from my office to check they have taken appropriate steps to protect fundamental privacy rights." How did the data breach occur? According to the details provided by the British Airways website, payments through its main website and mobile app were affected from 22:58 BST August 21, 2018, until 21:45 BST September 5, 2018. Per ICO’s investigation, user traffic from the British Airways site was being directed to a fraudulent site from where customer details were harvested by the attackers. Personal information compromised included log in, payment card, and travel booking details as well name and address information. The fraudulent site performed what is known as a supply chain attack embedding code from third-party suppliers to run payment authorisation, present ads or allow users to log into external services, etc. According to a cyber-security expert, Prof Alan Woodward at the University of Surrey, the British Airways hack may possibly have been a company insider who tampered with the website and app's code for malicious purposes. He also pointed out that live data was harvested on the site rather than stored data. https://twitter.com/EerkeBoiten/status/1148130739642413056 RiskIQ, a cyber security company based in San Francisco, linked the British Airways attack with the modus operandi of a threat group Magecart. Magecart injects scripts designed to steal sensitive data that consumers enter into online payment forms on e-commerce websites directly or through compromised third-party suppliers. Per RiskIQ, Magecart set up custom, targeted infrastructure to blend in with the British Airways website specifically and to avoid detection for as long as possible. What happens next for British Airways? The ICO noted that British Airways cooperated with its investigation, and has made security improvements since the breach was discovered. They now have 28 days to appeal. Responding to the news, British Airways’ chairman and chief executive Alex Cruz said that the company was “surprised and disappointed” by the ICO’s decision, and added that the company has found no evidence of fraudulent activity on accounts linked to the breach. He said, "British Airways responded quickly to a criminal act to steal customers' data. We have found no evidence of fraud/fraudulent activity on accounts linked to the theft. We apologise to our customers for any inconvenience this event caused." ICO was appointed as the lead supervisory authority to tackle this case on behalf of other EU Member State data protection authorities. Under the GDPR ‘one stop shop’ provisions the data protection authorities in the EU whose residents have been affected will also have the chance to comment on the ICO’s findings. The penalty is divided up between the other European data authorities, while the money that comes to the ICO goes directly to the Treasury. What is somewhat surprising is that ICO disclosed the fine publicly even before Supervisory Authorities commented on ICOs findings and a final decision has been taken based on their feedback, as pointed by Simon Hania. https://twitter.com/simonhania/status/1148145570961399808 Record breaking fine appreciated by experts The penalty imposed on British Airways is the first one to be made public since GDPR’s new policies about data privacy were introduced. GDPR makes it mandatory to report data security breaches to the information commissioner. They also increased the maximum penalty to 4% of turnover of the penalized company. The fine would be the largest the ICO has ever issued; last ICO fined Facebook £500,000 fine for the Cambridge Analytica scandal, which was the maximum under the 1998 Data Protection Act. The British Airways penalty amounts to 1.5% of its worldwide turnover in 2017, making it roughly 367 times than of Facebook’s. Infact, it could have been even worse if the maximum penalty was levied;  the full 4% of turnover would have meant a fine approaching £500m. Such a massive fine would clearly send a sudden shudder down the spine of any big corporation responsible for handling cybersecurity - if they compromise customers' data, a severe punishment is in order. https://twitter.com/j_opdenakker/status/1148145361799798785 Carl Gottlieb, Privacy Lead & Data Protection Officer at Duolingo has summarized the factoids of this attack in a twitter thread which were much appreciated. GDPR fines are for inappropriate security as opposed to getting breached. Breaches are a good pointer but are not themselves actionable. So organisations need to implement security that is appropriate for their size, means, risk and need. Security is an organisation's responsibility, whether you host IT yourself, outsource it or rely on someone else not getting hacked. The GDPR has teeth against anyone that messes up security, but clearly action will be greatest where the human impact is most significant. Threats of GDPR fines are what created change in privacy and security practices over the last 2 years (not orgs suddenly growing a conscience). And with very few fines so far, improvements have slowed, this will help. Monetary fines are a great example to change behaviour in others, but a TERRIBLE punishment to drive change in an affected organisation. Other enforcement measures, e.g. ceasing processing personal data (e.g. ban new signups) would be much more impactful. https://twitter.com/CarlGottlieb/status/1148119665257963521 Facebook fined $2.3 million by Germany for providing incomplete information about hate speech content European Union fined Google 1.49 billion euros for antitrust violations in online advertising French data regulator, CNIL imposes a fine of 50M euros against Google for failing to comply with GDPR.
Read more
  • 0
  • 0
  • 28006
Modal Close icon
Modal Close icon