Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7010 Articles
article-image-developers-from-the-swift-for-tensorflow-project-propose-adding-first-class-differentiable-programming-to-swift
Bhagyashree R
09 Sep 2019
5 min read
Save for later

Developers from the Swift for TensorFlow project propose adding first-class differentiable programming to Swift

Bhagyashree R
09 Sep 2019
5 min read
After working for over 1.5 years on the Differentiable Programming Mega-Proposal, Richard Wei, a developer at Google Brain, and his team submitted the proposal on the Swift Evolution forum on Thursday last week. This proposal aims to “push Swift's capabilities to the next level in numerics and machine learning” by introducing differentiable programming as a new language feature in Swift. It is a part of the Swift for TensorFlow project under which the team is integrating TensorFlow directly into the language to offer developers a next-generation platform for machine learning. What is differentiable programming With the increasing sophistication in deep learning models and the introduction of modern deep learning frameworks, many researchers have started to realize that building neural networks is very similar to programming. Yann LeCun, VP and Chief AI Scientist at Facebook, calls differentiable programming “a little more than a rebranding of the modern collection Deep Learning techniques, the same way Deep Learning was a rebranding of the modern incarnations of neural nets with more than two layers.” He compares it with regular programming, with the only difference that the resulting programs are “parameterized, automatically differentiated, and trainable/optimizable.” Many also say that differentiable programming is a different name for automatic differentiation, a collection of techniques to numerically evaluate the derivative of a function. It can be seen as a new programming paradigm in which programs can be differentiated throughout. Check out the paper “Demystifying Differentiable Programming: Shift/Reset the Penultimate Backpropagation” to get a better understanding of differentiable programming. Why differential programming is proposed in Swift Swift is an expressive, high-performance language, which makes it a perfect candidate for numerical applications. According to the proposal authors, first-class support for differentiable programming in Swift will allow safe and powerful machine learning development. The authors also believe that this is a “big step towards high-level numerical computing support.” With this proposal, they aim to make Swift a “real contender in the numerical computing and machine learning landscape.” Here are some of the advantages of adding first-class support for differentiable programming in Swift: Better language coverage: First-class differentiable programming support will enable differentiation to work smoothly with other Swift features. This will allow developers to code normally without being restricted to a subset of Swift. Enable extensibility: This will provide developers an “extensible differentiable programming system.” They will be able to create custom differentiation APIs by leveraging primitive operators defined in the standard library and supported by the type system. Static warnings and errors: This will enable the compiler to statically identify the functions that cannot be differentiated or will give a zero derivative. It will then be able to give a non-differentiability error or warning. This will improve productivity by making common runtime errors in machine learning directly debuggable without library boundaries. Some of the components that will be added in Swift under this proposal are: The Differentiable protocol: This is a standard library protocol that will generalize all data structures that can be a parameter or result of a differentiable function. The @differentiable declaration attribute: This will be used to mark all the function-like declarations as differentiable. The @differentiable function types: This is a subtype of normal function types with a different runtime representation and calling convention. Differentiable function types will have differentiable parameters and results. Differential operators: These are the core differentiation APIs that take ‘@differentiable’ functions as inputs and return derivative functions or compute derivative values. @differentiating and @transposing attributes: These attributes are for declaring custom derivative function for some other function declaration. This proposal sparked a discussion on Hacker News. Many developers were excited about bringing differentiable programming support in the Swift core. A user commented, “This is actually huge. I saw a proof of concept of something like this in Haskell a few years back, but it's amazing it see it (probably) making it into the core of a mainstream language. This may let them capture a large chunk of the ML market from Python - and hopefully, greatly improve ML APIs while they're at it.” Some felt that a library could have served the purpose. “I don't see why a well-written library could not serve the same purpose. It seems like a lot of cruft. I doubt, for example, Python would ever consider adding this and it's the de facto language that would benefit the most from something like this - due to the existing tools and communities. It just seems so narrow and not at the same level of abstraction that languages typically sit at. I could see the language supporting higher-level functionality so a library could do this without a bunch of extra work (such as by some reflection),” a user added. Users also discussed another effort that goes along the lines of this project: Julia Zygote, which is a working prototype for source-to-source automatic differentiation. A user commented, “Yup, work is continuing apace with Julia’s next-gen Zygote project. Also, from the GP’s thought about applications beyond DL, my favorite examples so far are for model-based RL and Neural ODEs.” To know more in detail, check out the proposal: Differentiable Programming Mega-Proposal. Other news in programming Why Perl 6 is considering a name change? The Julia team shares its finalized release process with the community TypeScript 3.6 releases with stricter generators, new functions in TypeScript playground, better Unicode support for identifiers, and more
Read more
  • 0
  • 0
  • 34945

article-image-what-can-you-expect-at-neurips-2019
Sugandha Lahoti
06 Sep 2019
5 min read
Save for later

What can you expect at NeurIPS 2019?

Sugandha Lahoti
06 Sep 2019
5 min read
Popular machine learning conference NeurIPS 2019 (Conference on Neural Information Processing Systems) will be held on Sunday, December 8 through Saturday, December 14 at the Vancouver Convention Center. The conference invites papers tutorials, and submissions on cross-disciplinary research where machine learning methods are being used in other fields, as well as methods and ideas from other fields being applied to ML.  NeurIPS 2019 accepted papers Yesterday, the conference published the list of their accepted papers. A total of 1429 papers have been selected. Submissions opened from May 1 on a variety of topics such as Algorithms, Applications, Data implementations, Neuroscience, and Cognitive Science, Optimization, Probabilistic Methods, Reinforcement Learning and Planning, and Theory. (The full list of Subject Areas are available here.) This year at NeurIPS 2019, authors of accepted submissions were mandatorily required to prepare either a 3-minute video or a PDF of slides summarizing the paper or prepare a PDF of the poster used at the conference. This was done to make NeurIPS content accessible to those unable to attend the conference. NeurIPS 2019 also introduced a mandatory abstract submission deadline, a week before final submissions are due. Only a submission with a full abstract was allowed to have the full paper uploaded. The authors were also asked to answer questions from the Reproducibility Checklist during the submission process. NuerIPS 2019 tutorial program NeurIPS also invites experts to present tutorials that feature topics that are of interest to a sizable portion of the NeurIPS community and are different from the ones already presented at other ML conferences like ICML or ICLR. They looked for tutorial speakers that cover topics beyond their own research in a comprehensive manner that encompasses multiple perspectives.  The tutorial chairs for NeurIPS 2019 are Danielle Belgrave and Alice Oh. They initially compiled a list based on the last few years’ publications, workshops, and tutorials presented at NeurIPS and at related venues. They asked colleagues for recommendations and conducted independent research. In reviewing the potential candidates, the chair read papers to understand their expertise and watch their videos to appreciate their style of delivery. The list of candidates was emailed to the General Chair, Diversity & Inclusion Chairs, and the rest of the Organizing Committee for their comments on this shortlist. Following a few adjustments based on their input, the potential speakers were selected. A total of 9 tutorials have been selected for NeurIPS 2019: Deep Learning with Bayesian Principles - Emtiyaz Khan Efficient Processing of Deep Neural Network: from Algorithms to Hardware Architectures - Vivienne Sze Human Behavior Modeling with Machine Learning: Opportunities and Challenges - Nuria Oliver, Albert Ali Salah Interpretable Comparison of Distributions and Models - Wittawat Jitkrittum, Dougal Sutherland, Arthur Gretton Language Generation: Neural Modeling and Imitation Learning -  Kyunghyun Cho, Hal Daume III Machine Learning for Computational Biology and Health - Anna Goldenberg, Barbara Engelhardt Reinforcement Learning: Past, Present, and Future Perspectives - Katja Hofmann Representation Learning and Fairness - Moustapha Cisse, Sanmi Koyejo Synthetic Control - Alberto Abadie, Vishal Misra, Devavrat Shah NeurIPS 2019 Workshops NeurIPS Workshops are primarily used for discussion of work in progress and future directions. This time the number of Workshop Chairs doubled, from two to four; selected chairs are Jenn Wortman Vaughan, Marzyeh Ghassemi, Shakir Mohamed, and Bob Williamson. However, the number of workshop submissions went down from 140 in 2018 to 111 in 2019. Of these 111 submissions, 51 workshops were selected. The full list of selected Workshops is available here.  The NeurIPS 2019 chair committee introduced new guidelines, expectations, and selection criteria for the Workshops. This time workshops had an important focus on the nature of the problem, intellectual excitement of the topic, diversity, and inclusion, quality of proposed invited speakers, organizational experience and ability of the team and more.  The Workshop Program Committee consisted of 37 reviewers with each workshop proposal assigned to two reviewers. The reviewer committee included more senior researchers who have been involved with the NeurIPS community. Reviewers were asked to provide a summary and overall rating for each workshop, a detailed list of pros and cons, and specific ratings for each of the new criteria. After all reviews were submitted, each proposal was assigned to two of the four chair committee members. The chair members looked through assigned proposals and their reviews to form an educated assessment of the pros and cons of each. Finally, the entire chair held a meeting to discuss every submitted proposal to make decisions.  You can check more details about the conference on the NeurIPS website. As always keep checking this space for more content about the conference. In the meanwhile, you can read our previous year coverage: NeurIPS Invited Talk: Reproducible, Reusable, and Robust Reinforcement Learning NeurIPS 2018: How machine learning experts can work with policymakers to make good tech decisions [Invited Talk] NeurIPS 2018: Rethinking transparency and accountability in machine learning NeurIPS 2018: Developments in machine learning through the lens of Counterfactual Inference [Tutorial] Accountability and algorithmic bias: Why diversity and inclusion matters [NeurIPS Invited Talk]
Read more
  • 0
  • 0
  • 26494

article-image-google-is-circumventing-gdpr-reveals-braves-investigation-for-the-authorized-buyers-ad-business-case
Bhagyashree R
06 Sep 2019
6 min read
Save for later

Google is circumventing GDPR, reveals Brave's investigation for the Authorized Buyers ad business case

Bhagyashree R
06 Sep 2019
6 min read
Last year, Dr. Johnny Ryan, the Chief Policy & Industry Relations Officer at Brave, filed a complaint against Google’s DoubleClick/Authorized Buyers ad business with the Irish Data Protection Commission (DPC). New evidence produced by Brave reveals that Google is circumventing GDPR and also undermining its own data protection measures. Brave calls Google’s Push Pages a GDPR workaround Brave’s new evidence rebuts some of Google’s claims regarding its DoubleClick/Authorized Buyers system, the world’s largest real-time advertising auction house. Google says that it prohibits companies that use its real-time bidding (RTB) ad system “from joining data they receive from the Cookie Matching Service.” In September last year, Google announced that it has removed encrypted cookie IDs and list names from bid requests with buyers in its Authorized Buyers marketplace. Brave’s research, however, found otherwise, “Brave’s new evidence reveals that Google allowed not only one additional party, but many, to match with Google identifiers. The evidence further reveals that Google allowed multiple parties to match their identifiers for the data subject with each other.” When you visit a website that has Google ads embedded on its web pages, Google will run a real-time bidding ad auction to determine which advertiser will get to display its ads. For this, it uses Push Pages, which is the mechanism in question here. Brave hired Zach Edwards, the co-founder of digital analytics startup Victory Medium, and MetaX, a company that audits data supply chains, to investigate and analyze a log of Dr. Ryan’s web browsing. The research revealed that Google's Push Pages can essentially be used as a workaround for user IDs. Google shares a ‘google_push’ identifier with the participating companies to identify a user. Brave says that the problem here is that the identifier that was shared was common to multiple companies. This means that these companies could have cross-referenced what they learned about the user from Google with each other. Used by more than 8.4 million websites, Google's DoubleClick/Authorized Buyers broadcasts personal data of users to 2000+ companies. This data includes the category of what a user is reading, which can reveal their political views, sexual orientation, religious beliefs, as well as their locations. There are also unique ID codes that are specific to a user that can let companies uniquely identify a user. All this information can give these companies a way to keep tabs on what users are “reading, watching, and listening to online.” Brave calls Google’s RTB data protection policies “weak” as they ask these companies to self-regulate. Google does not have much control over what these companies do with the data once broadcast. “Its policy requires only that the thousands of companies that Google shares peoples’ sensitive data with monitor their own compliance, and judge for themselves what they should do,” Brave wrote. A Google spokesperson, as a response to this news, told Forbes, “We do not serve personalised ads or send bid requests to bidders without user consent. The Irish DPC — as Google's lead DPA — and the UK ICO are already looking into real-time bidding in order to assess its compliance with GDPR. We welcome that work and are co-operating in full." Users recommend starting an “information campaign” instead of a penalty that will hardly affect the big tech This news triggered a discussion on Hacker News where users talked about the implications of RTB and what strict actions the EU can take to protect user privacy. A user explained, "So, let's say you're an online retailer, and you have Google IDs for your customers. You probably have some useful and sensitive customer information, like names, emails, addresses, and purchase histories. In order to better target your ads, you could participate in one of these exchanges, so that you can use the information you receive to suggest products that are as relevant as possible to each customer. To participate, you send all this sensitive information, along with a Google ID, and receive similar information from other retailers, online services, video games, banks, credit card providers, insurers, mortgage brokers, service providers, and more! And now you know what sort of vehicles your customers drive, how much they make, whether they're married, how many kids they have, which websites they browse, etc. So useful! And not only do you get all these juicy private details, but you've also shared your customers sensitive purchase history with anyone else who is connected to the exchange." Others said that a penalty is not going to deter Google. "The whole penalty system is quite silly. The fines destroy small companies who are the ones struggling to comply, and do little more than offer extremely gentle pokes on the wrist for megacorps that have relatively unlimited resources available for complete compliance, if they actually wanted to comply." Users suggested that the EU should instead start an information campaign. "EU should ignore the fines this time and start an "information campaign" regarding behavior of Google and others. I bet that hurts Google 10 times more." Some also said that not just Google but the RTB participants should also be held responsible. "Because what Google is doing is not dissimilar to how any other RTB participant is acting, saying this is a Google workaround seems disingenuous." With this case, Brave has launched a full-fledged campaign that aims to “reform the multi-billion dollar RTB industry spans sixteen EU countries.” To achieve this goal it has collaborated with several privacy NGOs and academics including the Open Rights Group, Dr. Michael Veale of the Turing Institute, among others. In other news, a Bloomberg report reveals that Google and other internet companies have recently asked for an amendment to the California Consumer Privacy Act, which will be enacted in 2020. The law currently limits how digital advertising companies collect and make money from user data. The amendments proposed include approval for collecting user data for targeted advertising, using the collected data from websites for their own analysis, and many others. Read the Bloomberg report to know more in detail. Other news in Data Facebook content moderators work in filthy, stressful conditions and experience emotional trauma daily, reports The Verge GDPR complaint in EU claim billions of personal data leaked via online advertising bids European Union fined Google 1.49 billion euros for antitrust violations in online advertising  
Read more
  • 0
  • 0
  • 44217

article-image-key-skills-every-database-programmer-should-have
Sugandha Lahoti
05 Sep 2019
7 min read
Save for later

Key skills every database programmer should have

Sugandha Lahoti
05 Sep 2019
7 min read
According to Robert Half Technology’s 2019 IT salary report, ‘Database programmer’ is one of the 13 most in-demand tech jobs for 2019. For an entry-level programmer, the average salary is $98,250 which goes up to $167,750 for a seasoned expert. A typical database programmer is responsible for designing, developing, testing, deploying, and maintaining databases. In this article, we will list down the top critical tech skills essential to database programmers. #1 Ability to perform Data Modelling The first step is to learn to model the data. In Data modeling, you create a conceptual model of how data items relate to each other. In order to efficiently plan a database design, you should know the organization you are designing the database from. This is because Data models describe real-world entities such as ‘customer’, ‘service’, ‘products’, and the relation between these entities. Data models provide an abstraction for the relations in the database. They aid programmers in modeling business requirements and in translating business requirements into relations. They are also used for exchanging information between the developers and business owners. During the design phase, the database developer should pay great attention to the underlying design principles, run a benchmark stack to ensure performance, and validate user requirements. They should also avoid pitfalls such as data redundancy, null saturation, and tight coupling. #2 Know a database programming language, preferably SQL Database programmers need to design, write and modify programs to improve their databases. SQL is one of the top languages that are used to manipulate the data in a database and to query the database. It's also used to define and change the structure of the data—in other words, to implement the data model. Therefore it is essential that you learn SQL. In general, SQL has three parts: Data Definition Language (DDL): used to create and manage the structure of the data Data Manipulation Language (DML): used to manage the data itself Data Control Language (DCL): controls access to the data Considering, data is constantly inserted into the database, changed, or retrieved DML is used more often in day-to-day operations than the DDL, so you should have a strong grasp on DML. If you plan to grow in a database architect role in the near future, then having a good grasp of DDL will go a long way. Another reason why you should learn SQL is that almost every modern relational database supports SQL. Although different databases might support different features and implement their own dialect of SQL, the basics of the language remain the same. If you know SQL, you can quickly adapt to MySQL, for example. At present, there are a number of categories of database models predominantly, relational, object-relational, and NoSQL databases. All of these are meant for different purposes. Relational databases often adhere to SQL. Object-relational databases (ORDs) are also similar to relational databases. NoSQL, which stands for "not only SQL," is an alternative to traditional relational databases useful for working with large sets of distributed data. They provide benefits such as availability, schema-free, and horizontal scaling, but also have limitations such as performance, data retrieval constraints, and learning time. For beginners, it is advisable to first start with experimenting on relational databases learning SQL, gradually transitioning to NoSQL DBMS. #3 Know how to Extract, Transform, Load various data types and sources A database programmer should have a good working knowledge of ETL (Extract, Transform Load) programming. ETL developers basically extract data from different databases, transform it and then load the data into the Data Warehouse system. A Data Warehouse provides a common data repository that is essential for business needs. A database programmer should know how to tune existing packages, tables, and queries for faster ETL processing. They should conduct unit tests before applying any change to the existing ETL process. Since ETL takes data from different data sources (SQL Server, CSV, and flat files), a database developer should have knowledge on how to deal with different data sources. #4 Design and test Database plans Database programmers o perform regular tests to identify ways to solve database usage concerns and malfunctions. As databases are usually found at the lowest level of the software architecture, testing is done in an extremely cautious fashion. This is because changes in the database schema affect many other software components. A database developer should make sure that when changing the database structure, they do not break existing applications and that they are using the new structures properly. You should be proficient in Unit testing your database. Unit tests are typically used to check if small units of code are functioning properly. For databases, unit testing can be difficult. So the easiest way to do all of that is by writing the tests as SQL scripts. You should also know about System Integration Testing which is done on the complete system after the hardware and software modules of that system have been integrated. SIT validates the behavior of the system and ensures that modules in the system are functioning suitably. #5 Secure your Database Data protection and security are essential for the continuity of business. Databases often store sensitive data, such as user information, email addresses, geographical addresses, and payment information. A robust security system to protect your database against any data breach is therefore necessary. While a database architect is responsible for designing and implementing secure design options, a database admin must ensure that the right security and privacy policies are in place and are being observed. However, this does not absolve database programmers from adopting secure coding practices. Database programmers need to ensure that data integrity is maintained over time and is secure from unauthorized changes or theft. They need to especially be careful about Table Permissions i.e who can read and write to what tables. You should be aware of who is allowed to perform the 4 basic operations of INSERT, UPDATE, DELETE and SELECT against which tables. Database programmers should also adopt authentication best practices depending on the infrastructure setup, the application's nature, the user's characteristics, and data sensitivity. If the database server is accessed from the outside world, it is beneficial to encrypt sessions using SSL certificates to avoid packet sniffing. Also, you should secure database servers that trust all localhost connections, as anyone who accesses the localhost can access the database server. #6 Optimize your database performance A database programmer should also be aware of how to optimize their database performance to achieve the best results. At the basic level, they should know how to rewrite SQL queries and maintain indexes. Other aspects of optimizing database performance, include hardware configuration, network settings, and database configuration. Generally speaking, tuning database performance requires knowledge about the system's nature. Once the database server is configured you should calculate the number of transactions per second (TPS) for the database server setup. Once the system is up and running, and you should set up a monitoring system or log analysis, which periodically finds slow queries, the most time-consuming queries, etc. #7 Develop your soft skills Apart from the above technical skills, a database programmer needs to be comfortable communicating with developers, testers and project managers while working on any software project. A keen eye for detail and critical thinking can often spot malfunctions and errors that may otherwise be overlooked. A database programmer should be able to quickly fix issues within the database and streamline the code. They should also possess quick-thinking to prioritize tasks and meet deadlines effectively. Often database programmers would be required to work on documentation and technical user guides so strong writing and technical skills are a must. Get started If you want to get started with becoming a Database programmer, Packt has a range of products. Here are some of the best: PostgreSQL 11 Administration Cookbook Learning PostgreSQL 11 - Third Edition PostgreSQL 11 in 7 days [ Video ] Using MySQL Databases With Python [ Video ] Basic Relational Database Design [ Video ] How to learn data science: from data mining to machine learning How to ace a data science interview 5 barriers to learning and technology training for small software development teams
Read more
  • 0
  • 0
  • 34749

article-image-how-to-integrate-a-medium-editor-in-angular-8
Guest Contributor
05 Sep 2019
5 min read
Save for later

How to integrate a Medium editor in Angular 8

Guest Contributor
05 Sep 2019
5 min read
In the world of text editing, there is a new era of WYSIWYG (What You See Is What You Get). We all know how styling and formatting become the important elements of your website but most of the times it is tough to pick a simple, easy-to-use and powerful editor. Currently, the good days are coming back with the new Medium Editor! Medium Editor is an independent Javascript library to make a coasting content manager bar which springs up when you select a bit of content of your page which is enlivened by the magnificence of Medium.com. You can turn every field from a message on your contact form to a whole article on the back-end into a professionally styled text paragraph contains quote blocks, heading, hyperlinks or just a few selected words. You can also try to incorporate text editor into Angular 8 for the ease of updates and edits of your content. Angular 8 has released its latest feature - beta 6 with the attractive new functionalities for testing your software and fixing bugs. One of them is Bazel - Google's open-source part of its internal build tool called Blaze which is capable of performing incremental builds and tests. Let us check how you can integrate a Medium editor in the Angular 8 platform. Also Read: Angular CLI 8.3.0 releases with a new deploy command, faster production builds, and more Steps to create an editor using Angular 8 Step 1: First thing first, Create a project in Angular and you can also make use of bootstrap for making it look pretty good by adding CDN links in the index.html After entering the above-given command line, it will generate an angular starter application after it has completed installing all the dependencies. Step 2: Install an npm package by entering the below-given line. And then, include the CSS and js in angular.json file Step 3: Create a component with your chosen name and then create the one with the name create Step 4: Click to the newly created component.html and make a div by giving it a template reference of the name Try the above-given code snippet under a few bootstrap classes just to give a basic stylings Step 5: Select your component class and make a variable editor to view the child property as listed below: Step 6: Then, we will make use of one lifecycle hook of angular which is ngAfterViewInit. Paste the above-given scrap and you may get a mistake like media supervisor that isn't characterized all things considered, in this way, we have to declare it on the top like In the wake of rolling out the above improvements, you can, for the most part, make a little medium-supervisor to utilize it for yourself. You can pick over to compose anything and simply select the content you have written to see the enchantment. Step 7: After this, you may require some more alternatives in your editorial manager toolbar. For doing as such, you have to pass an arrangement object in the MediumEditor Constructor. By making the selective changes, you will be able to see a load of available options. Step 8: So, now you have got an editor, you can easily get the data from it. If someone writes a post then you need to have an HTML write of the same. Once more, you have to partition the screen into two parts wherein one half, there will be a supervisor and the other half will show the see of the post. [box type="shadow" align="" class="" width=""]In the second half of the screen, you need to assign the value of inner HTML as given above[/box] Wrap Up Every system is prone to pros and cons and so does the Angular too. Angular offers a clean code development along with the high-performance framework that manages to route, providing seamless updates using Command Line Interface and retrieving the state of location services. Also, you can debug the templates in Angular 8 and supports multiple applications in one domain. Contrary to this, Angular might be confusing for the newcomers as there is no accurate manual which includes the proper documentation of the framework. Also, it lacks the developer community and there is limited scope to debug Limited Routing. However, Angular 8 supports multiple applications in one domain and user-friendly for all the versions of the operating system. So, here we come to the end of the article. We hope you have gained information on how to integrate the latest medium editor to Angular 8. Do give it a try! Till then - keep learning! Author Bio Dave Jarvis is working as a Business Development Executive at eTatvaSoft.com, an enterprise-level mobile & web application development company. He aims to sharpen his analytical skills, deepening his data understanding and broaden his business knowledge in these years of his career. Click here to find more information about the company. Follow him on Twitter. Other interesting news in Web development Google Chrome 76 now supports native lazy-loading Laravel 6.0 releases with Laravel vapor compatibility, LazyCollection, improved authorization response and more #Reactgate forces React leaders to confront the community’s toxic culture head on
Read more
  • 0
  • 0
  • 33742

article-image-go-1-13-releases-with-error-wrapping-tls-1-3-enabled-by-default-improved-number-literals-and-more
Bhagyashree R
04 Sep 2019
5 min read
Save for later

Go 1.13 releases with error wrapping, TLS 1.3 enabled by default, improved number literals, and more

Bhagyashree R
04 Sep 2019
5 min read
Yesterday, the Go team announced the release of Go 1.13. This version comes with uniform and modernized number literals, support for error wrapping, TLS 1.3 on by default, improved modules support, and much more. Also, the go command now uses module mirror and checksum database by default. Want to learn Go? Get started or get up to date with our new edition of Mastering Go. Learn more. https://twitter.com/golang/status/1168966214108033029   Key updates in Go 1.13 TLS 1.3 enabled by default Go 1.13 comes with Transportation Layer Security (TLS) 1.3 support in the crypto/tls package enabled by default. If you wish to disable it, add tls13=0 to the GODEBUG environment variable. The team shared that the option to opt-out will be removed in Go 1.14. Uniform and modernized number literal prefixes Go 1.13 introduces optional prefixes for binary and octal integer literals, hexadecimal floating-point literals, and imaginary literals. The 0b or 0B prefix will indicate a binary integer literal, for instance, ‘0b1011.’ The 0o or 0O prefix will indicate an octal integer literal such as 0o660. The 0 prefix used in previous versions is also valid. The 0x or 0X prefix will be used to represent the mantissa of a floating-point number in hexadecimal format such as 0x1.0p-1021. The imaginary suffix (i) can be used with any integer or floating-point literal. To improve readability, the digits of any number literal can now be separated using underscores, for instance,  1_000_000, 0b_1010_0110. It can also be used between the literal prefix and the first digit. Read also: The Go team shares new proposals planned to be implemented in Go 1.13 and 1.14 Go 1.13 features module mirror to download modules Starting with Go 1.13, the go command will download and authenticate modules using the module mirror and checksum database by default. A module mirror is a module proxy that fetches modules from origin servers and then caches them for use in future requests. This enables the mirror to serve the source code even when the origin servers are down. The go command will use the module mirror maintained by the Go team, which is served at https://proxy.golang.org. In case you are using an earlier version of the go command, you can use this service by setting ‘GOPROXY=https://proxy.golang.org’ in your local environment. Checksum database to authenticate modules The go command maintains two Go modules files called go.mod and go.sum. Introduced in version 1.11, Go modules are an alternative to GOPATH with support for versioning and package distribution. They are basically a collection of Go packages stored in a file with a go.mod file at its root. The go.sum file consists of SHA-256 hashes of the source code that the go command can use to identify any misbehavior by an origin server or proxy. However, the drawback of this method is that it “works entirely by the trust on your first use.” When a version of a dependency is added for the first time, the go command will fetch the code and add lines to go.sum file on the fly. “The problem is that those go.sum lines aren’t being checked against anyone else’s: they might be different from the go.sum lines that the go command just generated for someone else, perhaps because a proxy intentionally served malicious code targeted to you,” the team explains. To solve this the Go team has come up with a global source of go.sum lines which they call a checksum database. This will ensure that the go command always adds the same lines to everyone’s go.sum file. So, whenever the go command receives new source code, it can verify its hash against the global database, which is served by sum.golang.org. Read also: Golang 1.13 module mirror, index, and Checksum database are now production-ready Support for error wrapping As per the error value proposal, Go 1.13 comes with support for error wrapping. This provides a standard way to wrap errors to the standard library. Now, an error can wrap another error by defining an Unwrap method that returns the wrapped error. Explaining the working, the team wrote, “An error e can wrap another error w by providing an Unwrap method that returns w. Both e and w are available to programs, allowing e to provide additional context to w or to reinterpret it while still allowing programs to make decisions based on w.” To support this behavior, the fmt.Errorf function now has a new %w verb for creating wrapped errors. There are also three new functions in the errors package namely errors.Unwrap, errors.Is and errors.As to simplify unwrapping and inspecting wrapped errors. These were some of the updates in Go 1.13. To read the entire list of features, check out its official release notes. What’s new in programming languages The Julia team shares its finalized release process with the community Introducing Nushell: A Rust-based shell TypeScript 3.6 releases with stricter generators, new functions in TypeScript playground, better Unicode support for identifiers, and more
Read more
  • 0
  • 0
  • 20968
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-how-to-learn-data-science-from-data-mining-to-machine-learning
Richard Gall
04 Sep 2019
6 min read
Save for later

How to learn data science: from data mining to machine learning

Richard Gall
04 Sep 2019
6 min read
Data science is a field that’s complex and diverse. If you’re trying to learn data science and become a data scientist it can be easy to fall down a rabbit hole of machine learning or data processing. To a certain extent, that’s good. To be an effective data scientist you need to be curious. You need to be prepared to take on a range of different tasks and challenges. But that’s not always that efficient: if you want to learn quickly and effectively, you need a clear structure - a curriculum - that you can follow. This post will show you what you need to learn and how to go about it. Statistics Statistics is arguably the cornerstone of data science. Nate Silver called data scientists “sexed up statisticians”, a comment that was perhaps unfair but still nevertheless contains a kernel of truth in it: that data scientists are always working in the domain of statistics. Once you understand this everything else you need to learn will follow easily. Machine learning, data manipulation, data visualization - these are all ultimately technological methods for performing statistical analysis really well. Best Packt books and videos content for learning statistics Statistics for Data Science R Statistics Cookbook Statistical Methods and Applied Mathematics in Data Science [Video] Before you go any deeper into data science, it’s critical that you gain a solid foundation in statistics. Data mining and wrangling This is an important element of data science that often gets overlooked with all the hype about machine learning. However, without effective data collection and cleaning, all your efforts elsewhere are going to be pointless at best. At worst they might even be misleading or problematic. Sometimes called data manipulation or data munging, it's really all about managing and cleaning data from different sources so it can be used for analytics projects. To do it well you need to have a clear sense of where you want to get to - do you need to restructure the data? Sort or remove certain parts of a data set? Once you understand this, it’s much easier to wrangle data effectively. Data mining and wrangling tools There are a number of different tools you can use for data wrangling. Python and R are the two key programming languages, and both have some useful tools for data mining and manipulation. Python in particular has a great range of tools for data mining and wrangling, such as pandas and NLTK (Natural Language Toolkit), but that isn’t to say R isn’t powerful in this domain. Other tools are available too - Weka and Apache Mahout, for example, are popular. Weka is written in Java so is a good option if you have experience with that programming language, while Mahout integrates well with the Hadoop ecosystem. Data mining and data wrangling books and videos If you need to learn data mining, wrangling and manipulation, Packt has a range of products. Here are some of the best: Data Wrangling with R Data Wrangling with Python Python Data Mining Quick Start Guide Machine Learning for Data Mining Machine learning and artificial intelligence Although Machine learning and artificial intelligence are huge trends in their own right, they are nevertheless closely aligned with data science. Indeed, you might even say that their prominence today has grown out of the excitement around data science that we first we witnessed just under a decade ago. It’s a data scientist’s job to use machine learning and artificial intelligence in a way that can drive business value. That could, for example, be to recommend products or services to customers, perhaps to gain a better understanding into existing products, or even to better manage strategic and financial risks through predictive modelling. So, while we can see machine learning in a massive range of digital products and platforms - all of which require smart development and design - for it to work successfully, it needs to be supported by a capable and creative data scientist. Machine learning and artificial intelligence books for data scientists Machine Learning Algorithms Machine Learning with R - Third Edition Machine Learning with Apache Spark Quick Start Guide Machine Learning with TensorFlow 1.x Keras Deep Learning Cookbook Data visualization A talented data scientist isn’t just a great statistician and engineer, they’re also a great communicator. This means so-called soft skills are highly valuable - the ability to communicate insights and ideas with key stakeholders is essential. But great communication isn’t just about soft skills, it’s also about data visualization. Data visualization is, at a fundamental level, about organizing and presenting data in a way that tells a story, clarifies a problem, or illustrates a solution. It’s essential that you don’t overlook this step. Indeed, spending time learning about effective data visualization can also help you to develop your soft skills. The principles behind storytelling and communication through visualization are, in truth, exactly the same when applied to other scenarios. Data visualization tools There are a huge range of data visualization tools available. As with machine learning, understanding the differences between them and working out what solution will work for you is actually an important part of the learning process. For that reason, don’t be afraid to spend a little bit of time with a range of data visualization tools. Many of the most popular data visualization tools are paid for products. Perhaps the best known of these is Tableau (which, incidentally was bought by Salesforce earlier this year). Tableau and its competitors are very user friendly, which means the barrier to entry is pretty low. They allow you to create some pretty sophisticated data visualizations fairly easily. However, sticking to these tools is not only expensive, it can also limit your abilities. We’d recommend trying a number of different data visualization tools, such as Seabor, D3.js, Matplotlib, and ggplot2. Data visualization books and videos for data scientists Applied Data Visualization with R and ggplot2 Tableau 2019.1 for Data Scientists [Video] D3.js Data Visualization Projects [Video] Tableau in 7 Steps [Video] Data Visualization with Python If you want to learn data science, just get started! As we've seen, data science requires a number of very different skills and takes in a huge breadth of tools. That means that if you're going to be a data scientist, you need to be prepared to commit to learning forver: you're never going to reach a point where you know everything. While that might sound intimidating, it's important to have confidence. With a sense of direction and purpose, and a learning structure that works for you, it's possible to develop and build your data science capabilities in a way that could unlock new opportunities and act as the basis for some really exciting projects.
Read more
  • 0
  • 0
  • 35213

article-image-whats-new-in-usb4-transfer-speeds-of-upto-40gb-second-with-thunderbolt-3-and-more
Sugandha Lahoti
04 Sep 2019
3 min read
Save for later

What’s new in USB4? Transfer speeds of upto 40GB/second with Thunderbolt 3 and more

Sugandha Lahoti
04 Sep 2019
3 min read
USB4 technical specifications were published yesterday. Along with removing space in stylization (USB4 instead of USB 4), the new version offers double the speed of it’s previous versions. USB4 architecture is based on Intel’s Thunderbolt; Intel had provided Thunderbolt 3 to the USB Promoter Group royalty-free earlier this year. Features of USB4 USB4 runs blazingly fast by incorporating Intel's Thunderbolt technology. It allows transfers at the rate of 40 gigabits per second, twice the speed of the latest version of USB 3 and 8 times the speed of the original USB 3 standard. 40Gbps speeds, can example, allow users to do things like connect two 4K monitors at once, or run high-end external GPUs with ease. Key characteristics as specified in the USB4 specification include: Two-lane operation using existing USB Type-C cables and up to 40 Gbps operation over 40 Gbps certified cables Multiple data and display protocols to efficiently share the total available bandwidth over the bus Backward compatibility with USB 3.2, USB 2.0 and Thunderbolt 3 Another good news is that USB4 will use the same USB-C connector design as USB 3, which means manufacturers will not need to introduce new USB4 ports into their devices. Why USB4 omits a space The change in stylization was to simplify things. In an interview with Tom’s Hardware, USB Promoter Group CEO Brad Saunders said this is to prevent the profusion of products sporting version number badges that could confuse consumers. “We don’t plan to get into a 4.0, 4.1, 4.2 kind of iterative path,” he explained. “We want to keep it as simple as possible. When and if it goes faster, we’ll simply have the faster version of the certification and the brand.” Is Thunderbolt 3 compatibility optional? The specification mentioned that using Thunderbolt 3 support is optional. The published spec states: It’s up to USB4 device makers to support Thunderbolt.  This was a major topic of discussion among people on Social media. https://twitter.com/KevinLozandier/status/1169106844289077248 The USB Implementers Forum released a detailed statement clarifying the issue. "Regarding USB4 specification’s optional support for Thunderbolt 3, USB-IF anticipates PC vendors to broadly support Thunderbolt 3 compatibility in their USB4 solutions given Thunderbolt 3 compatibility is now included in the USB4 specification and therefore royalty free for formal adopters," the USB-IF said in a statement. "That said, Intel still maintains the Thunderbolt 3 branding/certification so consumers can look for the appropriate Thunderbolt 3 logo and brand name to ensure the USB4 product in question has the expected Thunderbolt 3 compatibility. Furthermore, the decision was made not to make Thunderbolt 3 compatibility a USB4 specification requirement as certain manufacturers (e.g. smartphone makers) likely won’t need to add the extra capabilities that come with Thunderbolt 3 compatibility when designing their USB4 products," the statement added. Though the specification is released, it will be some time before USB4 compatible devices hit the market. We can expect to see devices that take advantage of the new version late 2020 or beyond. Read more in Hardware USB-IF launches ‘Type-C Authentication Program’ for better security Apple USB Restricted Mode: Here’s Everything You Need to Know USB 4 will integrate Thunderbolt 3 to increase the speed to 40Gbps
Read more
  • 0
  • 0
  • 25029

article-image-implementing-memory-management-with-golang-garbage-collector
Packt Editorial Staff
03 Sep 2019
10 min read
Save for later

Implementing memory management with Golang's garbage collector

Packt Editorial Staff
03 Sep 2019
10 min read
Did you ever think of how bulk messages are pushed in real-time that fast? How is it possible? Low latency garbage collector (GC) plays an important role in this. In this article, we present ways to look at certain parameters to implement memory management with the Golang GC. Garbage collection is the process of freeing up memory space that is not being used. In other words, the GC sees which objects are out of scope and cannot be referenced anymore and frees the memory space they consume. This process happens in a concurrent way while a Go program is running and not before or after the execution of the program. This article is an excerpt from the book Mastering Go - Third Edition by Mihalis Tsoukalos. Mihalis runs through the nuances of Go, with deep guides to types and structures, packages, concurrency, network programming, compiler design, optimization, and more.  Implementing the Golang GC The Go standard library offers functions that allow you to study the operation of the GC and learn more about what the GC does secretly. These functions are illustrated in the gColl.go utility. The source code of gColl.go is presented here in chunks. Package main import (    "fmt"    "runtime"    "time" ) You need the runtime package because it allows you to obtain information about the Go runtime system, which, among other things, includes the operation of the GC. func printStats(mem runtime.MemStats) { runtime.ReadMemStats(&mem) fmt.Println("mem.Alloc:", mem.Alloc) fmt.Println("mem.TotalAlloc:", mem.TotalAlloc) fmt.Println("mem.HeapAlloc:", mem.HeapAlloc) fmt.Println("mem.NumGC:", mem.NumGC, "\n") } The purpose of the printStats() function is to avoid writing the same Go code all the time. The runtime.ReadMemStats() call gets the latest garbage collection statistics for you. func main() {    var mem runtime.MemStats    printStats(mem)    for i := 0; i < 10; i++ { // Allocating 50,000,000 bytes        s := make([]byte, 50000000)        if s == nil {            fmt.Println("Operation failed!")          }    }    printStats(mem) In this part, we have a for loop that creates 10-byte slices with 50,000,000 bytes each. The reason for this is that by allocating large amounts of memory, we can trigger the GC. for i := 0; i < 10; i++ { // Allocating 100,000,000 bytes      s := make([]byte, 100000000)       if s == nil {           fmt.Println("Operation failed!")       }       time.Sleep(5 * time.Second)   } printStats(mem) } The last part of the program makes even bigger memory allocations – this time, each byte slice has 100,000,000 bytes. Running gColl.go on a macOS Big Sur machine with 24 GB of RAM produces the following kind of output: $ go run gColl.go mem.Alloc: 124616 mem.TotalAlloc: 124616 mem.HeapAlloc: 124616 mem.NumGC: 0 mem.Alloc: 50124368 mem.TotalAlloc: 500175120 mem.HeapAlloc: 50124368 mem.NumGC: 9 mem.Alloc: 122536 mem.TotalAlloc: 1500257968 mem.HeapAlloc: 122536 mem.NumGC: 19 The value of mem.Alloc is the bytes of allocated heap objects — allocated are all the objects that the GC has not yet freed. mem.TotalAlloc shows the cumulative bytes allocated for heap objects—this number does not decrease when objects are freed, which means that it keeps increasing. Therefore, it shows the total number of bytes allocated for heap objects during program execution. mem.HeapAlloc is the same as mem.Alloc. Last, mem.NumGC shows the total number of completed garbage collection cycles. The bigger that value is, the more you have to consider how you allocate memory in your code and if there is a way to optimize that. If you want even more verbose output regarding the operation of the GC, you can combine go run gColl.go with GODEBUG=gctrace=1. Apart from the regular program output, you get some extra metrics—this is illustrated in the following output: $ GODEBUG=gctrace=1 go run gColl.go gc 1 @0.021s 0%: 0.020+0.32+0.015 ms clock, 0.16+0.17/0.33/0.22+0.12 ms cpu, 4->4->0 MB, 5 MB goal, 8 P gc 2 @0.041s 0%: 0.074+0.32+0.003 ms clock, 0.59+0.087/0.37/0.45+0.030 ms cpu, 4->4->0 MB, 5 MB goal, 8 P . . . gc 18 @40.152s 0%: 0.065+0.14+0.013 ms clock, 0.52+0/0.12/0.042+0.10 ms cpu, 95->95->0 MB, 96 MB goal, 8 P gc 19 @45.160s 0%: 0.028+0.12+0.003 ms clock, 0.22+0/0.13/0.081+0.028 ms cpu, 95->95->0 MB, 96 MB goal, 8 P mem.Alloc: 120672 mem.TotalAlloc: 1500256376 mem.HeapAlloc: 120672 mem.NumGC: 19 Now, let us explain the 95->95->0 MB triplet in the previous line of output. The first value (95) is the heap size when the GC is about to run. The second value (95) is the heap size when the GC ends its operation. The last value is the size of the live heap (0). Go garbage collection is based on the tricolor algorithm The operation of the Go GC is based on the tricolor algorithm, which is the subject of this subsection. Note that the tricolor algorithm is not unique to Go and can be used in other programming languages as well. Strictly speaking, the official name for the algorithm used in Go is the tricolor mark-and-sweep algorithm. It can work concurrently with the program and uses a write barrier. This means that when a Go program runs, the Go scheduler is responsible for the scheduling of the application and the GC. This is as if the Go scheduler has to deal with a regular application with multiple goroutines! The core idea behind this algorithm came from Edsger W. Dijkstra, Leslie Lamport, A. J. Martin, C. S. Scholten, and E. F. M. Steffens and was first illustrated in a paper named On-the-Fly Garbage Collection: An Exercise in Cooperation. The primary principle behind the tricolor mark-and-sweep algorithm is that it divides the objects of the heap into three different sets according to their color, which is assigned by the algorithm. It is now time to talk about the meaning of each color set. The objects of the black set are guaranteed to have no pointers to any object of the white set. However, an object of the white set can have a pointer to an object of the black set because this has no effect on the operation of the GC. The objects of the gray set might have pointers to some objects of the white set. Finally, the objects of the white set are the candidates for garbage collection. So, when the garbage collection begins, all objects are white, and the GC visits all the root objects and colors them gray. The roots are the objects that can be directly accessed by the application, which includes global variables and other things on the stack. These objects mostly depend on the Go code of a program. After that, the GC picks a gray object, makes it black, and starts looking at whether that object has pointers to other objects of the white set or not. Therefore, when an object of the gray set is scanned for pointers to other objects, it is colored black. If that scan discovers that this particular object has one or more pointers to a white object, it puts that white object in the gray set. This process keeps going for as long as objects exist in the gray set. After that, the objects in the white set are unreachable and their memory space can be reused. Therefore, at this point, the elements of the white set are said to be garbage collected. Please note that no object can go directly from the black set to the white set, which allows the algorithm to operate and be able to clear the objects on the white set. As mentioned before, no object of the black set can directly point to an object of the white set. Additionally, if an object of the gray set becomes unreachable at some point in a garbage collection cycle, it will not be collected at this garbage collection cycle but in the next one! Although this is not an optimal situation, it is not that bad. During this process, the running application is called the mutator. The mutator runs a small function named write barrier that is executed each time a pointer in the heap is modified. If the pointer of an object in the heap is modified, which means that this object is now reachable, the write barrier colors it gray and puts it in the gray set. The mutator is responsible for the invariant that no element of the black set has a pointer to an element of the white set. This is accomplished with the help of the write barrier function. Failing to accomplish this invariant will ruin the garbage collection process and will most likely crash your program in a pretty bad and undesirable way! So, there are three different colors: black, white, and gray. When the algorithm begins, all objects are colored white. As the algorithm keeps going, white objects are moved into one of the other two sets. The objects that are left in the white set are the ones that are going to be cleared at some point. The next figure displays the three color sets with objects in them. Figure 1: The Go GC represents the heap of a program as a graph In the presented graph, you can see that while object E, which is in the white set, can access object F, it cannot be accessed by any other object because no other object points to object E, which makes it a perfect candidate for garbage collection! Additionally, objects A, B, and C are root objects and are always reachable; therefore, they cannot be garbage collected. Graph comprehended Can you guess what will happen next in that graph? Well, it is not that difficult to realize that the algorithm will have to process the remaining elements of the gray set, which means that both objects A and F will go to the black set. Object A will go to the black set because it is a root element and F will go to the black set because it does not point to any other object while it is in the gray set. After object A is garbage collected, object F will become unreachable and will be garbage collected in the next cycle of the GC because an unreachable object cannot magically become reachable in the next iteration of the garbage collection cycle. Note: The Go garbage collection can also be applied to variables such as channels. When the GC finds out that a channel is unreachable, that is when the channel variable cannot be accessed anymore, it will free its resources even if the channel has not been closed. Go allows you to manually initiate a garbage collection by putting a runtime.GC() statement in your Go code. However, have in mind that runtime.GC() will block the caller and it might block the entire program, especially if you are running a very busy Go program with many objects. This mainly happens because you cannot perform garbage collections while everything else is rapidly changing, as this will not give the GC the opportunity to clearly identify the members of the white, black, and gray sets. This garbage collection status is also called garbage collection safe-point. You can find the long and relatively advanced Go code of the GC at https://github.com/golang/go/blob/master/src/runtime/mgc.go, which you can study if you want to learn even more information about the garbage collection operation. You can even make changes to that code if you are brave enough! Understanding Go Internals: defer, panic() and recover() functions [Tutorial] Implementing hashing algorithms in Golang [Tutorial] Is Golang truly community driven and does it really matter?
Read more
  • 0
  • 0
  • 84250

article-image-how-to-ace-a-data-science-interview
Richard Gall
02 Sep 2019
12 min read
Save for later

How to ace a data science interview

Richard Gall
02 Sep 2019
12 min read
So, you want to be a data scientist. It’s a smart move: it’s a job that’s in high demand, can command a healthy salary, and can also be richly rewarding and engaging. But to get the job, you’re going to have to pass a data science interview - something that’s notoriously tough. One of the reasons for this is that data science is a field that is incredibly diverse. I mean that in two different ways: on the one hand it’s a role that demands a variety of different skills (being a good data scientist is about much more than just being good at math). But it's also diverse in the sense that data science will be done differently at every company. That means that every data science interview is going to be different. If you specialize too much in one area, you might well be severely limiting your opportunities. There are plenty of articles out there that pretend to have all the answers to your next data science interview. And while these can be useful, they also treat job interviews like they’re just exams you need to pass. They’re not - you need to have a wide range of knowledge, but you also need to present yourself as a curious and critical thinker, and someone who is very good at communicating. You won’t get a data science by knowing all the answers. But you might get it by asking the right questions and talking in the right way. So, with all that in mind, here are what you need to do to ace your data science interview. Know the basics of data science This is obvious but it’s impossible to overstate. If you don’t know the basics, there’s no way you’ll get the job - indeed, it’s probably better for your sake that you don’t get it! But what are these basics? Basic data science interview questions "What is data science?" This seems straightforward, but proving you’ve done some thinking about what the role actually involves demonstrates that you’re thoughtful and self-aware - a sign of any good employee. "What’s the difference between supervised and unsupervised learning?" Again, this is straightforward, but it will give the interviewer confidence that you understand the basics of machine learning algorithms. "What is the bias and variance tradeoff? What is overfitting and underfitting?" Being able to explain these concepts in a clear and concise manner demonstrates your clarity of thought. It also shows that you have a strong awareness of the challenges of using machine learning and statistical systems. If you’re applying for a job as a data scientist you’ll probably already know the answers to all of these. Just make sure you have a clear answer and that you can explain each in a concise manner. Know your algorithms Knowing your algorithms is a really important part of any data science interview. However, it’s important to not get hung up on the details. Trying to learn everything you know about every algorithm you know isn’t only impossible, it’s also not going to get you the job. What’s important instead is demonstrating that you understand the differences between algorithms, and when to use one over another. Data science interview questions about algorithms you might be asked "When would you use a supervised machine learning algorithm?" "Can you name some supervised machine learning algorithms and the differences between them?" (supervised machine learning algorithms include Support Vector Machines, Naive Bayes, K-nearest Neighbor Algorithm, Regression, Decision Trees) "When would you use an unsupervised machine learning algorithm?" (unsupervised machine learning algorithms include K-Means, autoencoders, Generative Adversarial Networks, and Deep Belief Nets.) Name some unsupervised machine learning algorithms and how they’re different from one another. "What are classification algorithms?" There are others, but try to focus on these as core areas. Remember, it’s also important to always talk about your experience - that’s just as useful, if not even more useful than listing off the differences between different machine learning algorithms. Some of the questions you face in a data science interview might even be about how you use algorithms: "Tell me about the time you used an algorithm. Why did you decide to use it? Were there any other options?" "Tell me about a time you used an algorithm and it didn’t work how you expected it to. What did you do?" When talking about algorithms in a data science interview it’s useful to present them as tools for solving business problems. It can be tempting to talk about them as mathematical concepts, and although it’s good to show off your understanding, showing how algorithms help solve real-world business problems will be a big plus for your interviewer. Be confident talking about data sources and infrastructure challenges One of the biggest challenges for data scientists is dealing with incomplete or poor quality data. If that’s something you’ve faced - or even if it’s something you think you might face in the future - then make sure you talk about that. Data scientists aren’t always responsible for managing a data infrastructure (that will vary from company to company), but even if that isn’t in the job description, it’s likely that you’ll have to work with a data architect to make sure data is available and accurate to be able to carry our data science projects. This means that understanding topics like data streaming, data lakes and data warehouses is very important in a data science interview. Again, remember that it’s important that you don’t get stuck on the details. You don’t need to recite everything you know, but instead talk about your experience or how you might approach problems in different ways. Data science interview questions you might get asked about using different data sources "How do you work with data from different sources?" "How have you tackled dirty or unreliable data in the past?" Data science interview questions you might get asked about infrastructure "Talk me through a data infrastructure challenge you’ve faced in the past" "What’s the difference between a data lake and data warehouse? How would you approach each one differently?" Show that you have a robust understanding of data science tools You can’t get through a data science interview without demonstrating that you have knowledge and experience of data science tools. It’s likely that the job you’re applying for will mention a number of different skill requirements in the job description, so make sure you have a good knowledge of them all. Obviously, the best case scenario is that you know all the tools mentioned in the job description inside out - but this is unlikely. If you don’t know one - or more - make sure you understand what they’re for and how they work. The hiring manager probably won’t expect candidates to know everything, but they will expect them to be ready and willing to learn. If you can talk about a time you learned a new tool that will give the interviewer a lot of confidence that you’re someone that can pick up knowledge and skills quickly. Show you can evaluate different tools and programming languages Another element here is to be able to talk about the advantages and disadvantages of different tools. Why might you use R over Python? Which Python libraries should you use to solve a specific problem? And when should you just use Excel? Sometimes the interviewer might ask for your own personal preferences. Don’t be scared about giving your opinion - as long as you’ve got a considered explanation for why you hold the opinion that you do, you’re fine! Read next: Why is Python so good for AI and Machine Learning? 5 Python Experts Explain Data science interview questions about tools that you might be asked "What tools have you - or could you - use for data processing and cleaning? What are their benefits and disadvantages?" (These include tools such as Hadoop, Pentaho, Flink, Storm, Kafka.) "What tools do you think are best for data visualization and why?" (This includes tools like Tableau, PowerBI, D3.js, Infogram, Chartblocks - there are so many different products in this space that it’s important that you are able to talk about what you value most about data visualization tools.) "Do you prefer using Python or R? Are there times when you’d use one over another?" "Talk me through machine learning libraries. How do they compare to one another?" (This includes tools like TensorFlow, Keras, and PyTorch. If you don’t have any experience with them, make sure you’re aware of the differences, and talk about which you are most curious about learning.) Always focus on business goals and results This sounds obvious, but it’s so easy to forget. This is especially true if you’re a data geek that loves to talk about statistical models and machine learning. To combat this, make sure you’re very clear on how your experience was tied to business goals. Take some time to think about why you were doing what you were doing. What were you trying to find out? What metrics were you trying to drive? Interpersonal and communication skills Another element to this is talking about your interpersonal skills and your ability to work with a range of different stakeholders. Think carefully about how you worked alongside other teams, how you went about capturing requirements and building solutions for them. Think also about how you managed - or would manage - expectations. It’s well known that business leaders can expect data to be a silver bullet when it comes to results, so how do you make sure that people are realistic. Show off your data science portfolio A good way of showing your business acumen as a data scientist is to build a portfolio of work. Portfolios are typically viewed as something for creative professionals, but they’re becoming increasingly popular in the tech industry as competition for roles gets tougher. This post explains everything you need to build a great data science portfolio. Broadly, the most important thing is that it demonstrates how you have added value to an organization. This could be: Insights you’ve shared in reports with management Building customer-facing applications that rely on data Building internal dashboards and applications Bringing a portfolio to an interview can give you a solid foundation on which you can answer questions. But remember - you might be asked questions about your work, so make sure you have an answer prepared! Data science interview questions about business performance "Talk about a time you have worked across different teams." "How do you manage stakeholder expectations?" "What do you think are the most important elements in communicating data insights to management?" If you can talk fluently about how your work impacts business performance and how you worked alongside others in non-technical positions, you will give yourself a good chance of landing the job! Show that you understand ethical and privacy issues in data science This might seem like a superfluous point but given the events of recent years - like the Cambridge Analytica scandal - ethics has become a big topic of conversation. Employers will expect prospective data scientists to have an awareness of some of these problems and how you can go about mitigating them. To some extent, this is an extension of the previous point. Showing you are aware of ethical issues, such as privacy and discrimination, proves that you are fully engaged with the needs and risks a business might face. It also underlines that you are aware of the consequences and potential impact of data science activities on customers - what your work does in the real-world. Read next: Introducing Deon, a tool for data scientists to add an ethics checklist Data science interview questions about ethics and privacy "What are some of the ethical issues around machine learning and artificial intelligence?" "How can you mitigate any of these issues? What steps would you take?" "Has GDPR impacted the way you do data science?"  "What are some other privacy implications for data scientists?" "How do you understand explainability and interpretability in machine learning?" Ethics is a topic that’s easy to overlook but it’s essential for every data scientist. To get a good grasp of the issues it’s worth investigating more technical content on things like machine learning interpretability, as well as following news and commentary around emergent issues in artificial intelligence. Conclusion: Don’t treat a data science interview like an exam Data science is a complex and multi-faceted field. That can make data science interviews feel like a serious test of your knowledge - and it can be tempting to revise like you would for an exam. But, as we’ve seen, that’s foolish. To ace a data science interview you can’t just recite information and facts. You need to talk clearly and confidently about your experience and demonstrate your drive and curiosity. That doesn’t mean you shouldn’t make sure you know the basics. But rather than getting too hung up on definitions and statistical details, it’s a better use of your time to consider how you have performed your roles in the past, and what you might do in the future. A thoughtful, curious data scientist is immensely valuable. Show your interviewer that you are one.
Read more
  • 0
  • 0
  • 53075
article-image-data-science-vs-machine-learning-understanding-the-difference-and-what-it-means-today
Richard Gall
02 Sep 2019
8 min read
Save for later

Data science vs. machine learning: understanding the difference and what it means today

Richard Gall
02 Sep 2019
8 min read
One of the things that I really love about the tech industry is how often different terms - buzzwords especially - can cause confusion. It isn’t hard to see this in the wild. Quora is replete with confused people asking about the difference between a ‘developer’ and an ‘engineer’ and how ‘infrastructure’ is different from ‘architecture'. One of the biggest points of confusion is the difference between data science and machine learning. Both terms refer to different but related domains - given their popularity it isn’t hard to see how some people might be a little perplexed. This might seem like a purely semantic problem, but in the context of people’s careers, as they make decisions about the resources they use and the courses they pay for, the distinction becomes much more important. Indeed, it can be perplexing for developers thinking about their career - with machine learning engineer starting to appear across job boards, it’s not always clear where that role begins and ‘data scientist’ begins. Tl;dr: To put it simply - and if you can’t be bothered to read further - data science is a discipline or job role that’s all about answering business questions through data. Machine learning, meanwhile, is a technique that can be used to analyze or organize data. So, data scientists might well use machine learning to find something out, but it would only be one aspect of their job. But what are the implications of this distinction between machine learning and data science? What can the relationship between the two terms tell us about how technology trends evolve? And how can it help us better understand them both? Read next: 9 data science myths debunked What’s causing confusion about the difference between machine learning and data science? The data science v machine learning confusion comes from the fact that both terms have a significant grip on the collective imagination of the tech and business world. Back in 2012 the Harvard Business Review declared data scientist to be the ‘sexiest job of the 21st century’. This was before the machine learning and artificial intelligence boom, but it’s the point we need to go back to understand how data has shaped the tech industry as we know it today. Data science v machine learning on Google Trends Take a look at this Google trends graph: Both terms broadly received a similar level of interest. ‘Machine learning’ was slightly higher throughout the noughties and a larger gap has emerged more recently. However, despite that, it’s worth looking at the period around 2014 when ‘data science’ managed to eclipse machine learning. Today, that feels remarkable given how machine learning is a term that’s extended out into popular consciousness. It suggests that the HBR article was incredibly timely, identifying the emergence of the field. But more importantly, it’s worth noting that this spike for ‘data science’ comes at the time that both terms surge in popularity. So, although machine learning eventually wins out, ‘data science’ was becoming particularly important at a time when these twin trends were starting to grow. This is interesting, and it’s contrary to what I’d expect. Typically, I’d imagine the more technical term to take precedence over a more conceptual field: a technical trend emerges, for a more abstract concept to gain traction afterwards. But here the concept - the discipline - spikes just at the point before machine learning can properly take off. This suggests that the evolution and growth of machine learning begins with the foundations of data science. This is important. It highlights that the obsession with data science - which might well have seemed somewhat self-indulgent - was, in fact, an integral step for business to properly make sense of what the ‘big data revolution’ (a phrase that sounds eighty years old) meant in practice. Insofar as ‘data science’ is a term that really just refers to a role that’s performed, it’s growth was ultimately evidence of a space being carved out inside modern businesses that gave a domain expert the freedom to explore and invent in the service of business objectives. If that was the baseline, then the continued rise of machine learning feels inevitable. From being contained in computer science departments in academia, and then spreading into business thanks to the emergence of the data scientist job role, we then started to see a whole suite of tools and use cases that were about much more than analytics and insight. Machine learning became a practical tool that had practical applications everywhere. From cybersecurity to mobile applications, from marketing to accounting, machine learning couldn’t be contained within the data science discipline. This wasn’t just a conceptual point - practically speaking, a data scientist simply couldn’t provide support to all the different ways in which business functions wanted to use machine learning. So, the confusion around the relationship between machine learning and data science stems from the fact that the two trends go hand in hand - or at least they used to. To properly understand how they’re different, let’s look at what a data scientist actually does. Read next: Data science for non-techies: How I got started (Part 1) What is data science, exactly? I know you’re not supposed to use Wikipedia as a reference, but the opening sentence in the entry for ‘data science’ is instructive: “Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data.” The word that deserves your attention is multi-disciplinary as this underlines what makes data science unique and why it stands outside of the more specific taxonomy of machine learning terms. Essentially, it’s a human activity as much as a technical one - it’s about arranging, organizing, interpreting, and communicating data. To a certain extent it shares a common thread of DNA with statistics. But although Nate Silver said that ‘data scientist’ was “a sexed up term for statistician”, I think there are some important distinctions. To do data science well you need to be deeply engaged with how your work integrates with the wider business strategy and processes. The term ‘statistics’ - like ‘machine learning’ - doesn’t quite do this. Indeed, to a certain extent this has made data science a challenging field to work in. It isn’t hard to find evidence that data scientists are trying to leave their jobs, frustrated with how their roles are being used and how they integrate into existing organisational structures. How do data scientists use machine learning? As a data scientist, your job is to answer questions. These are questions like: What might happen if we change the price of a product in this way? What do our customers think of our products? How often do customers purchase products? How are customers using our products? How can we understand the existing market? How might we tackle it? Where could we improve efficiencies in our processes? That’s just a small set. The types of questions data scientists will be tackling will vary depending on the industry, their company - everything. Every data science job is unique. But whatever questions data scientists are asking, it’s likely that at some point they’ll be using machine learning. Whether it’s to analyze customer sentiment (grouping and sorting) or predicting outcomes, a data scientist will have a number of algorithms up their proverbial sleeves ready to tackle whatever the business throws at them. Machine learning beyond data science The machine learning revolution might have started in data science, but it has rapidly expanded far beyond that strict discipline. Indeed, one of the reasons that some people are confused about the relationship between the two concepts is because machine learning is today touching just about everything, like water spilling out of its neat data science container. Machine learning is for everyone Machine learning is being used in everything from mobile apps to cybersecurity. And although data scientists might sometimes play a part in these domains, we’re also seeing subject specific developers and engineers taking more responsibility for how machine learning is used. One of the reasons for this is, as I mentioned earlier, the fact that a data scientist - or even a couple of them - can’t do all the things that a business might want them to when it comes to machine learning. But another is the fact that machine learning is getting easier. You no longer need to be an expert to employ machine learning algorithms - instead, you need to have the confidence and foundational knowledge to use existing machine learning tools and products. This ‘productization’ of machine learning is arguably what’s having the biggest impact on how we understand the topic. It’s even shrinking data science, making it a more specific role. That might sound like data science is less important today than it was in 2014, but it can only be a good thing for data scientists - it means they are being asked to spread themselves so thinly. So, if you've been googling 'data science v machine learning', you now know the answer. The two terms are distinct but they both come out of the 'big data revolution' which we're still living through. Both trends and terms are likely to evolve in the future, but they're certainly not going to disappear - as the data at our disposal grow, making effective use of it is only going to become more important.
Read more
  • 0
  • 0
  • 42098

article-image-wasmers-first-postgres-extension-to-run-webassembly-is-here
Vincy Davis
30 Aug 2019
2 min read
Save for later

Wasmer's first Postgres extension to run WebAssembly is here!

Vincy Davis
30 Aug 2019
2 min read
Wasmer, the WebAssembly runtime have been successfully embedded in many languages like Rust, Python, Ruby, PHP, and Go. Yesterday, Ivan Enderlin, a PhD Computer Scientist at Wasmer, announced a new Postgres extension version 0.1 for WebAssembly. Since the project is still under heavy development, the extension only supports integers (on 32- and 64-bits) and works on Postgres 10 only. It also does not support strings, records, views or any other Postgres types yet. https://twitter.com/WasmWeekly/status/1167330724787171334 The official post states, “The goal is to gather a community and to design a pragmatic API together, discover the expectations, how developers would use this new technology inside a database engine.” The Postgres extension provides two foreign data wrappers wasm.instances and wasm.exported_functions in the wasm foreign schema. The wasm.instances is a table with an id and wasm_file columns. The wasm.exported_functions is a table with the instance_id, name, inputs, and outputs columns. Enderlin says that these information are enough for the wasm Postgres extension “to generate the SQL function to call the WebAssembly exported functions.” Read Also: Wasmer introduces WebAssembly Interfaces for validating the imports and exports of a Wasm module The Wasmer team ran a basic benchmark of computing the Fibonacci sequence computation to compare the execution time between WebAssembly and PL/pgSQL. The benchmark was run on a MacBook Pro 15" from 2016, 2.9Ghz Core i7 with 16Gb of memory. Image Source: Wasmer The result was that, “Postgres WebAssembly extension is faster to run numeric computations. The WebAssembly approach scales pretty well compared to the PL/pgSQL approach, in this situation.” The Wasmer team believes that though it is too soon to consider WebAssembly as an alternative to PL/pgSQL, the above result makes them hopeful that it can be explored more. To know more about Postgres extension, check out its Github page. 5 barriers to learning and technology training for small software development teams Mozilla CEO Chris Beard to step down by the end of 2019 after five years in the role JavaScript will soon support optional chaining operator as its ECMAScript proposal reaches stage 3
Read more
  • 0
  • 0
  • 29867

article-image-5-barriers-to-learning-and-technology-training-for-small-software-development-teams
Richard Gall
30 Aug 2019
9 min read
Save for later

5 barriers to learning and technology training for small software development teams

Richard Gall
30 Aug 2019
9 min read
Managing, supporting, and facilitating training for software developers isn’t easy. Such is the pace of change that it can be tough to ensure that not only do your developers have the skills they need, but also that they have the resources at their disposal to explore new technologies, try new approaches and solve complex problems. Okay, sure: there are a wealth of free resources out there. But using these is really just learning to walk for many developers. It’s essential, but it can’t be the only thing developers have at their disposal. If it is, their company is doing them a disservice. Even if a company is committed to their team’s development, how to go about it still isn’t straightforward. There are a number of barriers that need to be overcome when it comes to technology training and learning. Here are 5 of them... Barrier 1: Picking resources on the right topics The days of developers specializing are long gone - particularly in a small tech team where everyone has to get stuck in. Full-stack has ensured that today’s software developers need to be confident on everything from databases to UI design, while DevOps means they need to assume responsibility for how their code actually runs in production. The days of it worked on my machine! are over. Read next: DevOps engineering and full-stack development – 2 sides of the same agile coin When you factor in cloud native and hybrid cloud, developers today might well not only be working on cloud platforms with a bewildering array of features and opportunities, but also working on a number of different platforms at once. This means that understanding exactly what your developers need to know can be immensely difficult. It also means that you have to set the technology agenda from the off, immediately limiting your developers ability to solve problems themselves. That’s only going to alienate your developers - you’re effectively telling them that their curiosity, and their desire to explore new topics and alternative solutions is pointless. What can you do? The only way to solve this is to ensure your developers have a rich set of learning resources. Don’t limit them to a specific tool or set of technologies. Allow them to explore new topics as they see fit - don’t force them down a pre-set path. The other benefit of this is that it means you can be flexible and open in terms of the technology you use. Barrier 2: Legacy software and brownfield projects Sometimes, however, you can’t be flexible about the technology you use. Your developers simply need to use existing tools to manage legacy systems and brownfield projects (which build on or demolish existing legacy software). This means that learning needs can become complex. Not only might your development team needs a diverse range of skills for one particular project, they might also need to learn very different skill sets for different projects. Maybe you’re planning a new project built in React.js - great, get your developers on some cutting-edge content. But then, alongside this, what if they also need to tackle legacy software built using PHP? It’s fine if they know PHP, but what if you have a new developer? Or what if they haven’t used it for years? What can you do? As above, a rich set of training and learning resources is vital. But in this instance it’s particularly important to ensure developers have access to learning materials on a wide range of technologies. So yes, cutting-edge topic coverage is essential, but so is content on established and even more residual technologies. Barrier 3: Lack of time for learning new software skills and concepts Lack of time is one of the biggest barriers to learning and training in the tech industry. At least that’s the perception - in reality, a lack of time to learn is caused by resource challenges, and the cultural status of learning inside an organization. If it isn’t a priority, then it’s often the thing that gets pushed back as new day to day activities manage to insert themselves in your team’s day or existing ones stretch out to fill the hours in a way that no one expected. This is risky. Although there are times when things simply need to get done quickly, overlooking learning will not only lead to disengagement in your team, it will also leave them unprepared for challenges that may emerge throughout the course of a project. It’s often said that software development is all about solving problems - but how confident can we be that we’re equipped to solve them if we’re not committed to making time for learning? What can you do? The first step is obvious: make time for learning. One method is to set aside a specific period in the week which your team is encouraged to use for personal development. While that can work well, sometimes trying to structure learning in this way can make it feel like a chore for employees, as if you’re trying to fit them into a predetermined routine. Another option is to simply encourage continuous learning. That might mean a short period every day just to learn a new concept or approach. Or it could be something like a learning diary that allows developers to record things they learn and plan what they want to learn next. Essentially, what’s important is putting your developers in control. Give them the tools to make time - as well as the resources that they can use quickly and easily, and they’ll not only learn new skills much more quickly, they’ll also be more engaged and more curious engineers. Read next: “All of my engineering teams have a machine learning feature on their roadmap” – Will Ballard talks artificial intelligence in 2019 [Interview] Barrier 4: Different learning preferences within the team In even a small engineering team, every person will have unique needs and preferences when it comes to learning. So, even if all your team wants to learn the same topics, they still might disagree about how they want to do it. Some developers, for example, can’t stand learning with video. It forces you to learn at its pace, and - shock horror - you have to listen to someone. Others, however, love it - it’s visual, immediate and, you know, maybe having someone with a voice explaining how things work isn’t actually that bad? Similarly, some people love training courses - the idea of sitting with someone, rather than going it alone - while others value their independence and agency. This means that keeping everyone happy can be tough. It might be tempting to go one of two ways - either decide how you want people to learn and let them get on with it, or let everyone go it alone, but neither approach is ideal. Forcing people to learn a certain way will penalise those who don’t like your preferred learning method, hurting not only their ability to learn new skills but also their trust with you. Letting everyone be independent, meanwhile, means you never have any oversight on what people are doing or using - there’s no level playing field. What can you do? The key to getting over this barrier is balance. You want to preserve your team's independence and sense of agency while also having some transparency over what people are using. Resources that offer a mix of formats are great for this. That way, no one is forced to watch hours and hours of video courses or to trawl through text that will just send them to sleep. Equally, another important thing to think about is how the learning resources you provide your team complement other learning activities that they may be doing independently (things like looking for answers to questions in blog posts or on YouTube). By doing that you can allow your developers some level of independence while also ensuring that you have a set of organizational resources and tools that are available and accessible to everyone. Read next: Why do IT teams need to transition from DevOps to DevSecOps? Barrier 5: The cost of technology training and learning resources Cost is perhaps the biggest barrier to training for development teams. Specific courses and events can be astronomical - especially if more than one person needs to attend - and even some learning platforms can cost significant amounts of money. Now, that’s not always a problem for engineering teams operating in a more corporate or enterprise environment. But for smaller teams in small and medium sized businesses, training can become quite an overhead. In turn, this can have other consequences - from a complete disregard and deprioritization of training and learning to internal frustration at who is getting support and investment and who isn’t, it’s not uncommon to see training become a catalyst for many other organizational and cultural problems. What can you do? The trick here is to not invest heavily in one single thing. Don’t blow your budget on a single course - what if it’s not up to scratch? Don’t commit to an expensive learning platform. However good the marketing looks, if your team doesn't like it they’re certainly not going to use it. Fortunately, there are affordable solutions out there that can ensure you’re not breaking the bank when it comes to training. It might even leave you with some cash left over that you can invest on other resources and materials. Learning new skills isn’t easy. It requires patience and commitment. But developers need support and resources to take some of the strain out of their learning challenges. There’s no one way to meet the learning needs of developers. But Packt for Teams can help you overcome many of the barriers of developer training. Learn more here.
Read more
  • 0
  • 0
  • 43588
article-image-the-accelerate-state-of-devops-2019-report-key-findings-scaling-strategies-and-proposed-performance-productivity-models
Vincy Davis
28 Aug 2019
8 min read
Save for later

The Accelerate State of DevOps 2019 Report: Key findings, scaling strategies and proposed performance &amp; productivity models

Vincy Davis
28 Aug 2019
8 min read
The State of DevOps report is a widely referenced body of DevOps research which helps organizations achieve high organizational performance and productivity. The DORA (DevOps Research and Assessment) team have published the 2019 Accelerate State of DevOps Report which is an independent review of the practices and capabilities of DevOps. Teams across the globe can take advantage of the survey findings and identify specific capabilities to improve their software delivery performance. Key findings in DevOps 2019 report Increase in Elite performers The DevOps 2019 report uses a cluster analysis method to identify the software delivery performance of the participants involved in the survey. All the respondents are categorized into four distinct groups called as Elite, High, Medium and Low Performers. The report states, “In this approach, those in one group are statistically similar to each other and dissimilar from those in other groups, based on our performance behaviors of throughput and stability: deployment frequency, lead time, time to restore service, and change fail rate.” According to the survey, the proportion of elite performers have jumped to 20% from the 7% last year. The percentage of medium performers have also increased, leading to a drop in the percentage of low performers. Hence, the cluster analysis method depicts a continued shift in the industry, as organizations continue to transform their technology. Image Source: 2019 Accelerate State of DevOps Report Read Also: Does it make sense to talk about DevOps engineers or DevOps tools? Strategies for scaling DevOps in organizations The Accelerate State of DevOps 2019 report have framed out several strategies for scaling DevOps in an organization. The strategies are created based on commonly used approaches observed by the DORA team across the industry. Training Center (DOJO): Employees are taken out of their usual work routines to learn new tools or technologies. Next, they are required to implement the new learnt methods in their work and also inspire others to do so. Center of Excellence: This strategy uses all the available skills for consultation purpose. Proof of Concept but Stall: A central team is given the freedom to build in any possible way (often by breaking organizational norms). However, the effort stalls after the PoC. Proof of Concept as a Template: It begins with the PoC but Stall project which is then replicated in other groups using the same pattern. Proof of Concept as a Seed: This approach ensures that the PoC members stay active in the new groups indefinitely or just as long to ensure that the new practices are sustainable. Communities of Practice: Groups sharing the same common interests in tooling, language, or methodologies are encouraged within an organization to share knowledge and expertise with each other and across teams. Big Bang: When an entire organization is transformed as per DevOps methodologies at once. Bottom-up or Grassroots: Small teams are put together to transform resources and share their success throughout the organization in an informal way. Mashup: When an organization implements several of the above described approaches with partial execution or with insufficient resources. The distribution of DevOps transformation strategies as per performance profiles are shown below. Image Source: 2019 Accelerate State of DevOps Report Cloud computing usage is the driving force behind elite performers The DORA team uses the five essential characteristics of cloud computing, as defined by the National Institute of Standards and Technology (NIST) to understand cloud usage patterns by the categorized performers. The essential characteristics of cloud computing are given below. On-demand self-service: DevOps users can use the cloud on an on-demand self-service premise to use the computing resources whenever needed, without any human intervention. This is the least used characteristic by users with only 57% of the survey respondents agreeing  on using it. Broad network access: Such networks can be used through heterogeneous platforms such as mobile phones, tablets, laptops, and workstations. 60% of the survey respondents agreed on using the broad network access. Resource pooling: The provider resources in a cloud are pooled in a multi-tenant model such that the physical and virtual resources are dynamically assigned on-demand. This allows the  customer to specify a location at a higher level of abstraction such as country, state, or datacenter. Among all the survey respondents, 58% of them agreed on using the resource pooling characteristics. Rapid elasticity: The cloud capabilities are elastically provisioned to the extent that it can be released rapidly to scale outward or inward on demand. Hence, the cloud capabilities seem to be unlimited and can be appropriated at any point of time in any quantity. When compared to 2018, this characteristic saw a growth of +15% this year, with 58% of the survey respondents utilizing it. Measured service: This is the most used characteristic of cloud computing as 62% of the respondents agreed on using the cloud measured service. It allows the cloud systems to be automatically used to control, optimize, and report resource usage based on the type of services like storage, processing, bandwidth, and active user accounts. The DevOps 2019 survey found that only 29% of all the respondents agreed on utilizing all the essential cloud computing characteristics. The survey also revealed that the elite performers in the survey used the cloud characteristics 24% times more than the low performers. Importance of psychological safety to SDO performance The DevOps survey found that a culture of psychological safety where team members feel safe to take risks and be vulnerable in front of each other are able to better deliver software delivery performance, organizational performance, and high productivity. This enables an organization to come up with new products and features without impacting the existing users, hence leading to desirable levels of software delivery and operational performance (SDO performance). The report concludes that due to psychological safety elite performers can twice achieve or exceed their organizational performance goals. Size of an industry does not correspond to its success This year, the retail industry saw significantly better SDO performance, in terms of speed and stability of software delivery. However, the DevOps report states that, “We found no evidence that industry has an impact with the exception of retail, suggesting that organizations of all types and sizes, including highly regulated industries such as financial services and government, can achieve high levels of performance.” The report suggests that size is not a factor for poor performance of other industries and high levels of performance can be achieved by adopting DevOps practices. Read Also: Listen: Puppet’s VP of Ecosystem Engineering Nigel Kersten talks about key DevOps challenges [Podcast] Proposed steps to improve performance and productivity in DevOps The Accelerate State of DevOps Report of 2019 proposes two research models to improve DevOps performance and productivity. Performance model The DevOps report states, “A key goal in digital transformation is optimizing software delivery performance.” To improve software delivery and operational (SDO) performance and organizational performance, teams can first start with basic automation such as version control and automated testing, monitoring, clear change approval processes, and a healthy culture. The DevOps 2019 survey finds that low performers use more proprietary software than high and elite performers. Image Source: 2019 Accelerate State of DevOps Report Productivity model The report defines productivity as “the ability to get complex, time-consuming tasks completed with minimal distractions and interruptions.” Teams can use the useful and easy-to-use tools along with internal and external search to accelerate productivity. The DevOps survey reveals that the highest performing engineers are 1.5 times more likely to use the easy-to-use tools. An improved amount of productivity also helps employees have a better work/life balance and less burnout. Image Source: 2019 Accelerate State of DevOps Report DevOps teams can use the above two models to locate their goal, identify the dependent factors and increase their overall performance and productivity output. Demographic makeup of the DevOps 2019 survey This year, the DORA survey had almost 1,000² participants. Among all the responses, 26% responses came from employees working in very large companies (10,000+). Compared to last year, there has been a drop in response from employees working in companies with 500-1,999 employees. In contrast, more responses have been received from people working in company sizes of 100-499 employees. The majority of participants worked in the Development or Engineering department. The DevOps or Site Reliability Engineering (SRE) department came next, followed by Manager, IT Operations or Infrastructure, and others. Half of the participants in the 2019 research are from North America, followed by EU/ UK at 29%. This year also saw a fall in responses from Asia, which is only 9% when compared to the 18% last year. The percentage of women on teams have also reduced to 16% (median) from the 25% reported last year. Image Source: 2019 Accelerate State of DevOps Report Developers across the world love the Accelerate State of DevOps 2019 report and are thanking the DORA team for major takeaways like cloud being a key differentiator, how to be a elite performer in software delivery performance, and more. https://twitter.com/kniklas/status/1165681512538398720 https://twitter.com/nickj69/status/1164707063907225600 https://twitter.com/DawieO/status/1164703456675819522 https://twitter.com/kylekyle/status/1164692941559877632 https://twitter.com/tottiLFC/status/1164800298885402624 https://twitter.com/mirko_novakovic/status/1164606178615341060 Here’s a two minute summary video of the Accelerate State of DevOps 2019 report, published by Google Cloud. https://www.youtube.com/watch?v=8M3WibXvC84 Interested readers can check out the report to see a detailed comparison of tool usage, according to low, medium, high, and elite profiles. Also, you can read the full 2019 Accelerate State of DevOps Report for more information. 5 reasons poor communication can sink DevSecOps 7 crucial DevOps metrics that you need to track Introducing kdevops, a modern DevOps framework for Linux kernel development
Read more
  • 0
  • 0
  • 24139

article-image-5-reasons-poor-communication-can-sink-devsecops
Guest Contributor
27 Aug 2019
7 min read
Save for later

5 reasons poor communication can sink DevSecOps

Guest Contributor
27 Aug 2019
7 min read
In the last few years, a major shift has occurred within the software industry in terms of how companies organize themselves and manage product responsibilities. The result is a merging of development and operations roles under a single umbrella known as DevOps. However, that’s not where the story ends. More and more businesses are beginning to realize that cybersecurity strategy should not be treated as an independent activity. Software reliability is completely dependent upon security and risk management; otherwise, you leave your organization vulnerable to external attacks. The result of this thinking has been an increase in the scope of the DevOps role, adding a security aspect as well also known as DevSecOps. But not every company can easily shift to the DevSecOps model overnight, especially when communication issues get in the way. In this article, we'll cover five of the most common roadblocks and ways to avoid them. 1. Unclear responsibilities Now that the DevSecOps trend has gone mainstream in the software industry, you'll see many new job listings popping up in this area. Unfortunately, some companies are using the term as a catch-all to throw various disconnected duties at a single person. The result can be quite frustrating. Leadership and management need to set a clear scope for the responsibilities of DevSecOps engineers and integrate them directly with other parts of the organization, including development, quality assurance, and cloud administrators. DevSecOps can only work as a strategy if it's fully adopted and understood by people at every level. It's also important to be careful not to characterize the DevSecOps concept as a group of tools. The focus should always remain on the individuals performing their duties and not the applications that they use. Clarifying this line of distinction can make it easier to communicate within your organization. 2. Missing connection to end-users Engineers in the DevSecOps role should be involved at every phase of the software development lifecycle. Otherwise, they will not have a holistic view of the platform's security status and how it is functioning. In a worst-case scenario, the DevSecOps team is disconnected from end-users entirely. If an engineer does not understand how real people are using their application, then the product as a whole is likely doomed. User requirements should form the basis of every coding project, and supporting the development lifecycle is only possible if that link exists and is maintained. 3. Too many (and unsecured) communication tools Engineers in a DevSecOps role often spend the majority of their days coordinating between other groups within the organization. This activity can't succeed unless there is a strong communication tool set at their disposal. One mistake many companies make is deciding to invest in dozens of different chat, messaging, and conferencing apps in hopes that it will make things easier. The problem is that easy online communication comes at the price of privacy. Some platforms retain your data for their own internal use or even to sell - in the form of uniquely identifiable IP addresses - to advertisers. Though it’s relatively easy to hide your IP address, do you want to trust an app that plays fast and loose with your information in the first place? The problem is that all this ease of communication comes at a price which often includes data retention or sharing with third parties of uniquely identifiable information like IP addresses, which can be hidden a few different ways - but why trust an app that reveals it in the first place? One way a DevSecOps team can address this issue is to emphasize the security risk to decision-makers in regards to many popular tools. Slack, WhatsApp, Snapchat and others are recent examples of a popular messaging apps that are now taking flak because of different security risks they pose. When it comes to email, even Gmail, the most popular email service, has been caught allowing unfettered access to user email addresses. Our advice is to use an encrypted email tool such as ProtonMail or Mailfence rather than rely on the usual suspects with with better name recognition. The more communication tools you use, the larger the threat surface vulnerable to hackers. 4. Alert fatigue One key part of the DevSecOps suite of responsibilities is to streamline all monitoring and alerting functions across the organization. The goal is to make it easy for both managers and engineers to find out about live issues and quickly navigate to the source of the problem. In some cases, DevSecOps engineers will be asked to set very sensitive alerting protocols so that no potential problem is missed. There is a downside, though, because having too many notifications can lead to alert fatigue. This is especially true if your monitoring tools are throwing false positives all day long. A string of unnecessary alerts will cause people to stop paying attention to them. An approach to alerting should be well thought out and clearly documented in runbooks by the DevSecOps team. A runbook should explain exactly what an alert means and the steps required to address it. This level of documentation allows DevSecOps engineers to outsource incident response to a larger group. 5. Hidden dependencies Because of the wide scope of the DevSecOps role, sometimes organizations expect engineers to be fortune-tellers and be able to predict how changes will impact code, tests, security. This level of confidence cannot be reached unless there is clear and consistent communication across the company. Take for example a decision to add a firewall protection around a database server to block outside threats. This will probably seem like a simple change for the engineers working on the system, but they may not realize that a new firewall could cut off connections to other services within the same infrastructure. If DevSecOps was involved in the meetings and decision making, then this type of hidden dependency could have been uncovered earlier. The DevSecOps model can only succeed if the organization has a strong policy of change management. Any modification to a live system should be thoroughly vetted by representatives of all teams. At that time, risks can be weighed and approvals can be made. Changes should be scheduled at times when the impact will be minimal. Final thoughts When browsing job listings, you'll surely see an influx of roles mentioning both DevOps and DevSecOps. These roles can have an incredibly wide scope and often play a critical role in the success of a software company. But breakdowns in communication have the potential to derail the goals of DevSecOps putting the entire organization at risk. Modern software development is all about being agile, meaning that requirements gathering and coding all happen with great flexibility and fluidity. The same should be true for DevSecOps. The duties of this role should be evaluated and tweaked on a regular basis and clear communication is the best way to go about it. Author Bio Gary Stevens is a front-end developer. He’s a full-time blockchain geek and a volunteer working for the Ethereum foundation as well as an active Github contributor. Why do IT teams need to transition from DevOps to DevSecOps? The seven deadly sins of web design Dark Web Phishing Kits: Cheap, plentiful and ready to trick you
Read more
  • 0
  • 0
  • 21310
Modal Close icon
Modal Close icon