Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials - Languages

135 Articles
article-image-work-with-classes-in-typescript
Amey Varangaonkar
15 May 2018
8 min read
Save for later

How to work with classes in Typescript

Amey Varangaonkar
15 May 2018
8 min read
If we are developing any application using TypeScript, be it a small-scale or a large-scale application, we will use classes to manage our properties and methods. Prior to ES 2015, JavaScript did not have the concept of classes, and we used functions to create class-like behavior. TypeScript introduced classes as part of its initial release, and now we have classes in ES6 as well. The behavior of classes in TypeScript and JavaScript ES6 closely relates to the behavior of any object-oriented language that you might have worked on, such as C#. This excerpt is taken from the book TypeScript 2.x By Example written by Sachin Ohri. Object-oriented programming in TypeScript Object-oriented programming allows us to represent our code in the form of objects, which themselves are instances of classes holding properties and methods. Classes form the container of related properties and their behavior. Modeling our code in the form of classes allows us to achieve various features of object-oriented programming, which helps us write more intuitive, reusable, and robust code. Features such as encapsulation, polymorphism, and inheritance are the result of implementing classes. TypeScript, with its implementation of classes and interfaces, allows us to write code in an object-oriented fashion. This allows developers coming from traditional languages, such as Java and C#, feel right at home when learning TypeScript. Understanding classes Prior to ES 2015, JavaScript developers did not have any concept of classes; the best way they could replicate the behavior of classes was with functions. The function provides a mechanism to group together related properties and methods. The methods can be either added internally to the function or using the prototype keyword. The following is an example of such a function: function Name (firstName, lastName) { this.firstName = firstName; this.lastName = lastName; this.fullName = function() { return this.firstName + ' ' + this.lastName ; }; } In this preceding example, we have the fullName method encapsulated inside the Name function. Another way of adding methods to functions is shown in the following code snippet with the prototype keyword: function Name (firstName, lastName) { this.firstName = firstName; this.lastName = lastName; } Name.prototype.fullName = function() { return this.firstName + ' ' + this.lastName ; }; These features of functions did solve most of the issues of not having classes, but most of the dev community has not been comfortable with these approaches. Classes make this process easier. Classes provide an abstraction on top of common behavior, thus making code reusable. The following is the syntax for defining a class in TypeScript: The syntax of the class should look very similar to readers who come from an object-oriented background. To define a class, we use a class keyword followed by the name of the class. The News class has three member properties and one method. Each member has a type assigned to it and has an access modifier to define the scope. On line 10, we create an object of a class with the new keyword. Classes in TypeScript also have the concept of a constructor, where we can initialize some properties at the time of object creation. Access modifiers Once the object is created, we can access the public members of the class with the dot operator. Note that we cannot access the author property with the espn object because this property is defined as private. TypeScript provides three types of access modifiers. Public Any property defined with the public keyword will be freely accessible outside the class. As we saw in the previous example, all the variables marked with the public keyword were available outside the class in an object. Note that TypeScript assigns public as a default access modifier if we do not assign any explicitly. This is because the default JavaScript behavior is to have everything public. Private When a property is marked as private, it cannot be accessed outside of the class. The scope of a private variable is only inside the class when using TypeScript. In JavaScript, as we do not have access modifiers, private members are treated similarly to public members. Protected The protected keyword behaves similarly to private, with the exception that protected variables can be accessed in the derived classes. The following is one such example: class base{ protected id: number; } class child extends base{ name: string; details():string{ return `${name} has id: ${this.id}` } } In the preceding code, we extend the child class with the base class and have access to the id property inside the child class. If we create an object of the child class, we will still not have access to the id property outside. Readonly As the name suggests, a property with a readonly access modifier cannot be modified after the value has been assigned to it. The value assigned to a readonly property can only happen at the time of variable declaration or in the constructor. In the above code, line 5 gives an error stating that property name is readonly, and cannot be an assigned value. Transpiled JavaScript from classes While learning TypeScript, it is important to remember that TypeScript is a superset of JavaScript and not a new language on its own. Browsers can only understand JavaScript, so it is important for us to understand the JavaScript that is transpiled by TypeScript. TypeScript provides an option to generate JavaScript based on the ECMA standards. You can configure TypeScript to transpile into ES5 or ES6 (ES 2015) and even ES3 JavaScript by using the flag target in the tsconfig.json file. The biggest difference between ES5 and ES6 is with regard to the classes, let, and const keywords which were introduced in ES6. Even though ES6 has been around for more than a year, most browsers still do not have full support for ES6. So, if you are creating an application that would target older browsers as well, consider having the target as ES5. So, the JavaScript that's generated will be different based on the target setting. Here, we will take an example of class in TypeScript and generate JavaScript for both ES5 and ES6. The following is the class definition in TypeScript: This is the same code that we saw when we introduced classes in the Understanding Classes section. Here, we have a class named News that has three members, two of which are public and one private. The News class also has a format method, which returns a string concatenated from the member variables. Then, we create an object of the News class in line 10 and assign values to public properties. In the last line, we call the format method to print the result. Now let's look at the JavaScript transpiled by TypeScript compiler for this class. ES6 JavaScript ES6, also known as ES 2015, is the latest version of JavaScript, which provides many new features on top of ES5. Classes are one such feature; JavaScript did not have classes prior to ES6. The following is the code generated from the TypeScript class, which we saw previously: If you compare the preceding code with TypeScript code, you will notice minor differences. This is because classes in TypeScript and JavaScript are similar, with just types and access modifiers additional in TypeScript. In JavaScript, we do not have the concept of declaring public members. The author variable, which was defined as private and was initialized at its declaration, is converted to a constructor initialization in JavaScript. If we had not have initialized author, then the produced JavaScript would not have added author in the constructor. ES5 JavaScript ES5 is the most popular JavaScript version supported in browsers, and if you are developing an application that has to support the majority of browser versions, then you need to transpile your code to the ES5 version. This version of JavaScript does not have classes, and hence the transpiled code converts classes to functions, and methods inside the classes are converted to prototypically defined methods on the functions. The following is the code transpiled when we have the target set as ES5 in the TypeScript compiler options: As discussed earlier, the basic difference is that the class is converted to a function. The interesting aspect of this conversion is that the News class is converted to an immediately invoked function expression (IIFE). An IIFE can be identified by the parenthesis at the end of the function declaration, as we see in line 9 in the preceding code snippet. IIFEs cause the function to be executed immediately and help to maintain the correct scope of a function rather than declaring the function in a global scope. Another difference was how we defined the method format in the ES5 JavaScript. The prototype keyword is used to add the additional behavior to the function, which we see here. A couple of other differences you may have noticed include the change of the let keyword to var, as let is not supported in ES5. All variables in ES5 are defined with the var keyword. Also, the format method now does not use a template string, but standard string concatenation to print the output. TypeScript does a good job of transpiling the code to JavaScript while following recommended practices. This helps in making sure we have a robust and reusable code with minimum error cases. If you found this tutorial useful, make sure you check out the book TypeScript 2.x By Example for more hands-on tutorials on how to effectively leverage the power of TypeScript to develop and deploy state-of-the-art web applications. How to install and configure TypeScript Understanding Patterns and Architectures in TypeScript Writing SOLID JavaScript code with TypeScript
Read more
  • 0
  • 0
  • 30675

article-image-regular-expressions-awk-programming
Pavan Ramchandani
18 May 2018
8 min read
Save for later

Regular expressions in AWK programming: What, Why, and How

Pavan Ramchandani
18 May 2018
8 min read
AWK is a pattern-matching language. It searches for a pattern in a file and, upon finding the corresponding match, it performs the file's action on the input line. This pattern could consist of fixed strings or a pattern of text. This variable content or pattern is generally searched with the help of regular expressions. Hence, regular expressions form an important part of AWK programming language. Today we will introduce you to the regular expressions in AWK programming and will get started with string-matching patterns and basic constructs to use with AWK. This article is an excerpt from a book written by Shiwang Kalkhanda, titled Learning AWK Programming. What is a regular expression? A regular expression, or regexpr, is a set of characters used to describe a pattern. A regular expression is generally used to match lines in a file that contain a particular pattern. Many Unix utilities operate on plain text files line by line, such as grep, sed, and awk. Regular expressions search for a pattern on a single line in a file. A regular expression doesn't search for a pattern that begins on one line and ends on another. Other programming languages may support this, notably Perl. Why use regular expressions? Generally, all editors have the ability to perform search-and-replace operations. Some editors can only search for patterns, others can also replace them, and others can also print the line containing that pattern. A regular expression goes many steps beyond this simple search, replace, and printing functionality, and hence it is more powerful and flexible. We can search for a word of a certain size, such as a word that has four characters or numbers. We can search for a word that ends with a particular character, let's say e. You can search for phone numbers, email IDs, and so on, and can also perform validation using regular expressions. They simplify complex pattern-matching tasks and hence form an important part of AWK programming. Other regular expression variations also exist, notably those for Perl. Using regular expressions with AWK There are mainly two types of regular expressions in Linux: Basic regular expressions that are used by vi, sed, grep, and so on Extended regular expressions that are used by awk, nawk, gawk, and egrep Here, we will refer to extended regular expressions as regular expressions in the context of AWK. In AWK, regular expressions are enclosed in forward slashes, '/', (forming the AWK pattern) and match every input record whose text belongs to that set. The simplest regular expression is a string of letters, numbers, or both that matches itself. For example, here we use the ly regular expression string to print all lines that contain the ly pattern in them. We just need to enclose the regular expression in forward slashes in AWK: $ awk '/ly/' emp.dat The output on execution of this code is as follows: Billy Chabra 9911664321 bily@yahoo.com M lgs 1900 Emily Kaur 8826175812 emily@gmail.com F Ops 2100 In this example, the /ly/ pattern matches when the current input line contains the ly sub-string, either as ly itself or as some part of a bigger word, such as Billy or Emily, and prints the corresponding line. Regular expressions as string-matching patterns with AWK Regular expressions are used as string-matching patterns with AWK in the following three ways. We use the '~' and '! ~' match operators to perform regular expression comparisons: /regexpr/: This matches when the current input line contains a sub-string matched by regexpr. It is the most basic regular expression, which matches itself as a string or sub-string. For example, /mail/ matches only when the current input line contains the mail string as a string, a sub-string, or both. So, we will get lines with Gmail as well as Hotmail in the email ID field of the employee database as follows: $ awk '/mail/' emp.dat The output on execution of this code is as follows: Jack Singh 9857532312 jack@gmail.com M hr 2000 Jane Kaur 9837432312 jane@gmail.com F hr 1800 Eva Chabra 8827232115 eva@gmail.com F lgs 2100 Ana Khanna 9856422312 anak@hotmail.com F Ops 2700 Victor Sharma 8826567898 vics@hotmail.com M Ops 2500 John Kapur 9911556789 john@gmail.com M hr 2200 Sam khanna 8856345512 sam@hotmail.com F lgs 2300 Emily Kaur 8826175812 emily@gmail.com F Ops 2100 Amy Sharma 9857536898 amys@hotmail.com F Ops 2500 In this example, we do not specify any expression, hence it automatically matches a whole line, as follows: $ awk '$0 ~ /mail/' emp.dat The output on execution of this code is as follows: Jack Singh 9857532312 jack@gmail.com M hr 2000 Jane Kaur 9837432312 jane@gmail.com F hr 1800 Eva Chabra 8827232115 eva@gmail.com F lgs 2100 Ana Khanna 9856422312 anak@hotmail.com F Ops 2700 Victor Sharma 8826567898 vics@hotmail.com M Ops 2500 John Kapur 9911556789 john@gmail.com M hr 2200 Sam khanna 8856345512 sam@hotmail.com F lgs 2300 Emily Kaur 8826175812 emily@gmail.com F Ops 2100 Amy Sharma 9857536898 amys@hotmail.com F Ops 2500 expression ~ /regexpr /: This matches if the string value of the expression contains a sub-string matched by regexpr. Generally, this left-hand operand of the matching operator is a field. For example, in the following command, we print all the lines in which the value in the second field contains a /Singh/ string: $ awk '$2 ~ /Singh/{ print }' emp.dat We can also use the expression as follows: $ awk '{ if($2 ~ /Singh/) print}' emp.dat The output on execution of the preceding code is as follows: Jack Singh 9857532312 jack@gmail.com M hr 2000 Hari Singh 8827255666 hari@yahoo.com M Ops 2350 Ginny Singh 9857123466 ginny@yahoo.com F hr 2250 Vina Singh 8811776612 vina@yahoo.com F lgs 2300 expression !~ /regexpr /: This matches if the string value of the expression does not contain a sub-string matched by regexpr. Generally, this expression is also a field variable. For example, in the following example, we print all the lines that don't contain the Singh sub-string in the second field, as follows: $ awk '$2 !~ /Singh/{ print }' emp.dat The output on execution of the preceding code is as follows: Jane Kaur 9837432312 jane@gmail.com F hr 1800 Eva Chabra 8827232115 eva@gmail.com F lgs 2100 Amit Sharma 9911887766 amit@yahoo.com M lgs 2350 Julie Kapur 8826234556 julie@yahoo.com F Ops 2500 Ana Khanna 9856422312 anak@hotmail.com F Ops 2700 Victor Sharma 8826567898 vics@hotmail.com M Ops 2500 John Kapur 9911556789 john@gmail.com M hr 2200 Billy Chabra 9911664321 bily@yahoo.com M lgs 1900 Sam khanna 8856345512 sam@hotmail.com F lgs 2300 Emily Kaur 8826175812 emily@gmail.com F Ops 2100 Amy Sharma 9857536898 amys@hotmail.com F Ops 2500 Any expression may be used in place of /regexpr/ in the context of ~; and !~. The expression here could also be if, while, for, and do statements. Basic regular expression construct Regular expressions are made up of two types of characters: normal text characters, called literals, and special characters, such as the asterisk (*, +, ?, .), called metacharacters. There are times when you want to match a metacharacter as a literal character. In such cases, we prefix that metacharacter with a backslash (), which is called an escape sequence. The basic regular expression construct can be summarized as follows: Here is the list of metacharacters, also known as special characters, that are used in building regular expressions:     ^    $    .    [    ]    |    (    )    *    +    ? The following table lists the remaining elements that are used in building a basic regular expression, apart from the metacharacters mentioned before: Literal A literal character (non-metacharacter ), such as A, that matches itself. Escape sequence An escape sequence that matches a special symbol: for example t matches tab. Quoted metacharacter () In quoted metacharacters, we prefix metacharacter with a backslash, such as $ that matches the metacharacter literally. Anchor (^) Matches the beginning of a string. Anchor ($) Matches the end of a string. Dot (.) Matches any single character. Character classes (...) A character class [ABC] matches any one of the A, B, or C characters. Character classes may include abbreviations, such as [A-Za-z]. They match any single letter. Complemented character classes Complemented character classes [^0-9] match any character except a digit. These operators combine regular expressions into larger ones: Alternation (|) A|B matches A or B. Concatenation AB matches A immediately followed by B. Closure (*) A* matches zero or more As. Positive closure (+) A+ matches one or more As. Zero or one (?) A? matches the null string or A. Parentheses () Used for grouping regular expressions and back-referencing. Like regular expressions, (r) can be accessed using n digit in future. Do check out the book Learning AWK Programming to learn more about the intricacies of AWK programming language for text processing. Read More What is the difference between functional and object-oriented programming? What makes a programming language simple or complex?
Read more
  • 0
  • 0
  • 30605

article-image-gophercon-2019-go-2-update-open-source-go-library-for-gui-support-for-webassembly-tinygo-for-microcontrollers-and-more
Fatema Patrawala
30 Jul 2019
9 min read
Save for later

GopherCon 2019: Go 2 update, open-source Go library for GUI, support for WebAssembly, TinyGo for microcontrollers and more

Fatema Patrawala
30 Jul 2019
9 min read
Last week Go programmers had a gala time learning, networking and programming at the Marriott Marquis San Diego Marina as the most awaited event GopherCon 2019 was held starting from 24th July till 27th July. GopherCon this year hit the road at San Diego with some exceptional conferences, and many exciting announcements for more than 1800 attendees from around the world. One of the attendees, Andrea Santillana Fernández, says the Go Community is growing, and doing quite well. She wrote on her blog post on the Source graph website that there are 1 million Go programmers around the world and month on month its membership keeps increasing. Indeed there is a significant growth in the Go community, so what did it have in store for the programmers at this year’s GopherCon 2019: On the road to Go 2 The major milestones for the journey to Go 2 were presented by Russ Coxx on Wednesday last week. He explained the main areas of focus for Go 2, which are as below: Error handling Russ notes that writing a program correctly without errors is hard. But writing a program correctly accounting for errors and external dependencies is much more difficult. He listed down a few errors which led in introducing error handling helpers like an optional Unwrap interface, errors.Is and errors.As in Go 1.13 version. Generics Russ spoke about Generics and said that they started exploring a new design since last year. They are working with programming language theory experts on the problem to help refine the proposal of generics code in Go. In a separate session, Ian Lance Taylor, introduced generics codes in Go. He briefly explained the need, implementation and benefits from generics for the Go language. Next, Taylor reviewed the Go contract design draft which included the addition of optional type parameters to types and functions. Taylor defined generics as “Generic programming which enables the representation of functions and data structures in a generic form, with types factored out.” Generic code is written using types, which are specified later. An unspecified type is called as type parameter. A type parameter offers support only when permitted by contracts. A generic code imparts strong basis for sharing codes and building programs. It can be compiled using an interface-based approach which optimizes time as the package is compiled only once. If a generic code is compiled multiple times, it can carry compile time cost. Ian showed a few sample codes written in Generics in Go. Dependency management In Go 2 the team wants to focus on Dependency management and explicitly refer to dependencies similar to Java. Russ explained this by giving a history of how in 2011 they introduced GOPATH to separate the distribution from the actual dependencies so that users could run multiple different distributions and to separate the concerns of the distribution from the external libraries. Then in 2015, they introduced the go vendor spec to formalize the vendor directory and simplify dependency management implementations. But in practice it did not work well. In 2016, they formed the dependency working group. This team started work on dep: a tool to reshape all the existing tools into one.The problem with dep and the vendor directory was multiple distinct incompatible versions of a dependency were represented by one import path. It is now called as the "Import Compatibility Rule". The team took what worked well and learned from VGo. VGo provides package uniqueness without breaking builds. VGo dictates different import paths for incompatible package versions. The team grouped similar packages and gave these groups a name: Modules. The VGo system is now go modules. It now integrates directly with the Go command. The challenge presented going forward is mostly around updating everything to use modules. Everything needs to be updated to work with the new conventions to work well. Tooling Finally, as a result of all these changes, they distilled and refined the Go toolchain. One of the examples of this is gopls or "Go Please". Gopls aims to create a smoother, standard interface to integrate with all editors, IDEs, continuous integration and others. Simple, portable and efficient graphical interfaces in Go Elias Naur presented Gio, a new open source Go library for writing immediate mode GUI programs that run on all the major platforms: Android, iOS/tvOS, macOS, Linux, Windows. The talk covered Gio's unusual design and how it achieves simplicity, portability and performance. Elias said, “I wanted to be able to write a GUI program in GO that I could implement only once and have it work on every platform. This, to me, is the most interesting feature of Gio.” https://twitter.com/rakyll/status/1154450455214190593 Elias also presented Scatter which is a Gio program for end-to-end encrypted messaging over email. Other features of Gio include: Immediate mode design UI state owned by program Only depends on lowest-level platform libraries Minimal dependency tree to keep things low level as possible GPU accelerated vector and text rendering It’s super efficient No garbage generated in drawing or layout code Cross platform (macOS, Linux, Windows, Android, iOS, tvOS, Webassembly) Core is 100% Go while OS-specific native interfaces are optional Gopls, new tool serves as a backend for Go editor Rebecca Stambler, mentioned in her presentation that the Go community has built many amazing tools to improve the Go developer experience. However, when a maintainer disappears or a new Go release wreaks havoc, the Go development experience becomes frustrating and complicated. To solve this issue, Rebecca revealed the details behind a new tool: gopls (pronounced as 'go please'). The tool is currently in development by the Go team and community, and it will ultimately serve as the backend for your Go editor. Below listed functionalities are expected from gopls: Show me errors, like unused variables or typos autocomplete would be nice function signature help, because we often forget While we're at it, hover-accessible "tooltip" documentation in general Help me jump to a variable that is needed to see An outline of package structure Get started with WebAssembly in Go WebAssembly in Go is here and ready to try! Although the landscape is evolving quickly, the opportunity is huge. The ability to deliver truly portable system binaries could potentially replace JavaScript in the browser. WebAssembly has the potential to finally realize the goal of being platform agnostic without having to rely on a JVM. In a session by Johan Brandhorst who introduces the technology, shows how to get started with WebAssembly and Go, discusses what is possible today and what will be possible tomorrow. As of Go 1.13, there is experimental support for WebAssembly using the JavaScript interface but as it is only experimental, using it in production is not recommended. Support for the WASI interface is not currently available but has been planned and may be available as early as in Go 1.14. Better x86 assembly generation from Go Michael McLoughlin in his presentation made the case for code generation techniques for writing x86 assembly from Go. Michael introduced assembly, assembly in Go, the use cases for when you would want to drop into assembly, and techniques for realizing speedups using assembly. He pointed out that most of the time, pure Go will be enough for 97% of programs, but there are those 3% of cases where it is warranted, and the examples he brought up were crypto, syscalls, and scientific computing. Michael then introduced a package called avo which makes high-performance Go assembly easier to write. He said that writing your assembly in Go will allow you to realize the benefits of a high level language such as code readability, the ability to create loops, variables, and functions, and parameterized code generation all while still realizing the benefits of writing assembly. Michael concluded the talk with his ideas for the future of avo. Use avo in projects specifically in large crypto implementations. More architecture support Possibly make avo an assembler itself (these kinds of techniques are used in JIT compilers) avo based libraries (avo/std/math/big, avo/std/crypto) The audience appreciated this talk on Twitter. https://twitter.com/darethas/status/1155336268076576768 The presentation slides for this are available on the blog. Miniature version of Golang, TinyGo for microcontrollers Ron Evans, creator of GoCV, GoBot and "technologist for hire" introduced TinyGo that can run directly on microcontrollers like Arduino and more. TinyGo uses the LLVM compiler toolchain to create native code that can run directly even on the smallest of computing devices. Ron demonstrated how Go code can be run on embedded systems using TinyGo, a compiler intended for use in microcontrollers, WebAssembly (WASM), and command-line tools. Evans began his presentation by countering the idea that Go, while fast, produces executables too large to run on the smallest computers. While that may be true of the standard Go compiler, TinyGo produces much smaller outputs. For example: "Hello World" program compiled using Go 1.12 => 1.1 MB Same program compiled using TinyGo 0.7.0 => 12 KB TinyGo currently lacks support for the full Go language and Go standard library. For example, TinyGo does not have support for the net package, although contributors have created implementations of interfaces that work with the WiFi chip built into Arduino chips. Support for Go Routines is also limited, although simple programs usually work. Evans demonstrated that despite some limitations, thanks to TinyGo, the Go language can still be run in embedded systems. Salvador Evans, son of Ron Evans, assisted him for this demonstration. At age 11, he has become the youngest GopherCon speaker so far. https://twitter.com/erikstmartin/status/1155223328329625600 There were talks by other speakers on topics like, improvements in VSCode for Golang, the first open source Golang interpreter with complete support of the language spec, Athens Project which is a proxy server in Go and how mobile development works in Go. https://twitter.com/ramyanexus/status/1155238591120805888 https://twitter.com/containous/status/1155191121938649091 https://twitter.com/hajimehoshi/status/1155184796386988035 Apart from these there were a whole lot of other talks which happened at the GopherCon 2019. There were live blogs posted by the attendees on various talks and till now more than 25 blogs are posted by the attendees on the Sourcegraph website. The Go team shares new proposals planned to be implemented in Go 1.13 and 1.14 Go introduces generic codes and a new contract draft design at GopherCon 2019 Is Golang truly community driven and does it really matter?  
Read more
  • 0
  • 0
  • 30379

article-image-creating-graph-application-python-neo4j-gephi-linkuriousjs
Greg Roberts
12 Oct 2015
13 min read
Save for later

Creating a graph application with Python, Neo4j, Gephi & Linkurious.js

Greg Roberts
12 Oct 2015
13 min read
I love Python, and to celebrate Packt Python week, I’ve spent some time developing an app using some of my favorite tools. The app is a graph visualization of Python and related topics, as well as showing where all our content fits in. The topics are all StackOverflow tags, related by their co-occurrence in questions on the site. The app is available to view at http://gregroberts.github.io/ and in this blog, I’m going to discuss some of the techniques I used to construct the underlying dataset, and how I turned it into an online application. Graphs, not charts Graphs are an incredibly powerful tool for analyzing and visualizing complex data. In recent years, many different graph database engines have been developed to make use of this novel manner of representing data. These databases offer many benefits over traditional, relational databases because of how the data is stored and accessed. Here at Packt, I use a Neo4j graph to store and analyze data about our business. Using the Cypher query language, it’s easy to express complicated relations between different nodes succinctly. It’s not just the technical aspect of graphs which make them appealing to work with. Seeing the connections between bits of data visualized explicitly as in a graph helps you to see the data in a different light, and make connections that you might not have spotted otherwise. This graph has many uses at Packt, from customer segmentation to product recommendations. In the next section, I describe the process I use to generate recommendations from the database. Make the connection For product recommendations, I use what’s known as a hybrid filter. This considers both content based filtering (product x and y are about the same topic) and collaborative filtering (people who bought x also bought y). Each of these methods has strengths and weaknesses, so combining them into one algorithm provides a more accurate signal. The collaborative aspect is straightforward to implement in Cypher. For a particular product, we want to find out which other product is most frequently bought alongside it. We have all our products and customers stored as nodes, and purchases are stored as edges. Thus, the Cypher query we want looks like this: MATCH (n:Product {title:’Learning Cypher’})-[r:purchased*2]-(m:Product) WITH m.title AS suggestion, count(distinct r)/(n.purchased+m.purchased) AS alsoBought WHERE m<>n RETURN* ORDER BY alsoBought DESC and will very efficiently return the most commonly also purchased product. When calculating the weight, we divide by the total units sold of both titles, so we get a proportion returned. We do this so we don’t just get the titles with the most units; we’re effectively calculating the size of the intersection of the two titles’ audiences relative to their overall audience size. The content side of the algorithm looks very similar: MATCH (n:Product {title:’Learning Cypher’})-[r:is_about*2]-(m:Product) WITH m.title AS suggestion, count(distinct r)/(length(n.topics)+length(m.topics)) AS alsoAbout WHERE m<>n RETURN * ORDER BY alsoAbout DESC Implicit in this algorithm is knowledge that a title is_about a topic of some kind. This could be done manually, but where’s the fun in that? In Packt’s domain there already exists a huge, well moderated corpus of technology concepts and their usage: StackOverflow. The tagging system on StackOverflow not only tells us about all the topics developers across the world are using, it also tells us how those topics are related, by looking at the co-occurrence of tags in questions. So in our graph, StackOverflow tags are nodes in their own right, which represent topics. These nodes are connected via edges, which are weighted to reflect their co-occurrence on StackOverflow: edge_weight(n,m) = (Number of questions tagged with both n & m)/(Number questions tagged with n or m) So, to find topics related to a given topic, we could execute a query like this: MATCH (n:StackOverflowTag {name:'Matplotlib'})-[r:related_to]-(m:StackOverflowTag) RETURN n.name, r.weight, m.name ORDER BY r.weight DESC LIMIT 10 Which would return the following: | n.name | r.weight | m.name ----+------------+----------+-------------------- 1 | Matplotlib | 0.065699 | Plot 2 | Matplotlib | 0.045678 | Numpy 3 | Matplotlib | 0.029667 | Pandas 4 | Matplotlib | 0.023623 | Python 5 | Matplotlib | 0.023051 | Scipy 6 | Matplotlib | 0.017413 | Histogram 7 | Matplotlib | 0.015618 | Ipython 8 | Matplotlib | 0.013761 | Matplotlib Basemap 9 | Matplotlib | 0.013207 | Python 2.7 10 | Matplotlib | 0.012982 | Legend There are many, more complex relationships you can define between topics like this, too. You can infer directionality in the relationship by looking at the local network, or you could start constructing Hyper graphs using the extensive StackExchange API. So we have our topics, but we still need to connect our content to topics. To do this, I’ve used a two stage process. Step 1 – Parsing out the topics We take all the copy (words) pertaining to a particular product as a document representing that product. This includes the title, chapter headings, and all the copy on the website. We use this because it’s already been optimized for search, and should thus carry a fair representation of what the title is about. We then parse this document and keep all the words which match the topics we’ve previously imported. #...code for fetching all the copy for all the products key_re = 'W(%s)W' % '|'.join(re.escape(i) for i in topic_keywords) for i in documents tags = re.findall(key_re, i[‘copy’]) i['tags'] = map(lambda x: tag_lookup[x],tags) Having done this for each product, we have a bag of words representing each product, where each word is a recognized topic. Step 2 – Finding the information From each of these documents, we want to know the topics which are most important for that document. To do this, we use the tf-idf algorithm. tf-idf stands for term frequency, inverse document frequency. The algorithm takes the number of times a term appears in a particular document, and divides it by the proportion of the documents that word appears in. The term frequency factor boosts terms which appear often in a document, whilst the inverse document frequency factor gets rid of terms which are overly common across the entire corpus (for example, the term ‘programming’ is common in our product copy, and whilst most of the documents ARE about programming, this doesn’t provide much discriminating information about each document). To do all of this, I use python (obviously) and the excellent scikit-learn library. Tf-idf is implemented in the class sklearn.feature_extraction.text.TfidfVectorizer. This class has lots of options you can fiddle with to get more informative results. import sklearn.feature_extraction.text as skt tagger = skt.TfidfVectorizer(input = 'content', encoding = 'utf-8', decode_error = 'replace', strip_accents = None, analyzer = lambda x: x, ngram_range = (1,1), max_df = 0.8, min_df = 0.0, norm = 'l2', sublinear_tf = False) It’s a good idea to use the min_df & max_df arguments of the constructor so as to cut out the most common/obtuse words, to get a more informative weighting. The ‘analyzer’ argument tells it how to get the words from each document, in our case, the documents are already lists of normalized words, so we don’t need anything additional done. #create vectors of all the documents vectors = tagger.fit_transform(map(lambda x: x['tags'],rows)).toarray() #get back the topic names to map to the graph t_map = tagger.get_feature_names() jobs = [] for ind, vec in enumerate(vectors): features = filter(lambda x: x[1]>0, zip(t_map,vec)) doc = documents[ind] for topic, weight in features: job = ‘’’MERGE (n:StackOverflowTag {name:’%s’}) MERGE (m:Product {id:’%s’}) CREATE UNIQUE (m)-[:is_about {source:’tf_idf’,weight:%d}]-(n) ’’’ % (topic, doc[‘id’], weight) jobs.append(job) We then execute all of the jobs using Py2neo’s Batch functionality. Having done all of this, we can now relate products to each other in terms of what topics they have in common: MATCH (n:Product {isbn10:'1783988363'})-[r:is_about]-(a)-[q:is_about]-(m:Product {isbn10:'1783289007'}) WITH a.name as topic, r.weight+q.weight AS weight RETURN topic ORDER BY weight DESC limit 6 Which returns: | topic ---+------------------ 1 | Machine Learning 2 | Image 3 | Models 4 | Algorithm 5 | Data 6 | Python Huzzah! I now have a graph into which I can throw any piece of content about programming or software, and it will fit nicely into the network of topics we’ve developed. Take a breath So, that’s how the graph came to be. To communicate with Neo4j from Python, I use the excellent py2neo module, developed by Nigel Small. This module has all sorts of handy abstractions to allow you to work with nodes and edges as native Python objects, and then update your Neo instance with any changes you’ve made. The graph I’ve spoken about is used for many purposes across the business, and has grown in size and scope significantly over the last year. For this project, I’ve taken from this graph everything relevant to Python. I started by getting all of our content which is_about Python, or about a topic related to python: titles = [i.n for i in graph.cypher.execute('''MATCH (n)-[r:is_about]-(m:StackOverflowTag {name:'Python'}) return distinct n''')] t2 = [i.n for i in graph.cypher.execute('''MATCH (n)-[r:is_about]-(m:StackOverflowTag)-[:related_to]-(m:StackOverflowTag {name:'Python'}) where has(n.name) return distinct n''')] titles.extend(t2) then hydrated this further by going one or two hops down each path in various directions, to get a large set of topics and content related to Python. Visualising the graph Since I started working with graphs, two visualisation tools I’ve always used are Gephi and Sigma.js. Gephi is a great solution for analysing and exploring graphical data, allowing you to apply a plethora of different layout options, find out more about the statistics of the network, and to filter and change how the graph is displayed. Sigma.js is a lightweight JavaScript library which allows you to publish beautiful graph visualizations in a browser, and it copes very well with even very large graphs. Gephi has a great plugin which allows you to export your graph straight into a web page which you can host, share and adapt. More recently, Linkurious have made it their mission to bring graph visualization to the masses. I highly advise trying the demo of their product. It really shows how much value it’s possible to get out of graph based data. Imagine if your Customer Relations team were able to do a single query to view the entire history of a case or customer, laid out as a beautiful graph, full of glyphs and annotations. Linkurious have built their product on top of Sigma.js, and they’ve made available much of the work they’ve done as the open source Linkurious.js. This is essentially Sigma.js, with a few changes to the API, and an even greater variety of plugins. On Github, each plugin has an API page in the wiki and a downloadable demo. It’s worth cloning the repository just to see the things it’s capable of! Publish It! So here’s the workflow I used to get the Python topic graph out of Neo4j and onto the web. Use Py2neo to graph the subgraph of content and topics pertinent to Python, as described above Add to this some other topics linked to the same books to give a fuller picture of the Python “world” Add in topic-topic edges and product-product edges to show the full breadth of connections observed in the data Export all the nodes and edges to csv files Import node and edge tables into Gephi. The reason I’m using Gephi as a middle step is so that I can fiddle with the visualisation in Gephi until it looks perfect. The layout plugin in Sigma is good, but this way the graph is presentable as soon as the page loads, the communities are much clearer, and I’m not putting undue strain on browsers across the world! The layout of the graph has been achieved using a number of plugins. Instead of using the pre-installed ForceAtlas layouts, I’ve used the OpenOrd layout, which I feel really shows off the communities of a large graph. There’s a really interesting and technical presentation about how this layout works here. Export the graph into gexf format, having applied some partition and ranking functions to make it more clear and appealing. Now it’s all down to Linkurious and its various plugins! You can explore the source code of the final page to see all the details, but here I’ll give an overview of the different plugins I’ve used for the different parts of the visualisation: First instantiate the graph object, pointing to a container (note the CSS of the container, without this, the graph won’t display properly: <style type="text/css"> #container { max-width: 1500px; height: 850px; margin: auto; background-color: #E5E5E5; } </style> … <div id="container"></div> … <script> s= new sigma({ container: 'container', renderer: { container: document.getElementById('container'), type: 'canvas' }, settings: { … } }); sigma.parsers.gexf - used for (trivially!) importing a gexf file into a sigma instance sigma.parsers.gexf( 'static/data/Graph1.gexf', s, function(s) { //callback executed once the data is loaded, use this to set up any aspects of the app which depend on the data }); sigma.plugins.filter - Adds the ability to very simply hide nodes/edges based on a callback function which returns a boolean. This powers the filtering widgets on the page. <input class="form-control" id="min-degree" type="range" min="0" max="0" value="0"> … function applyMinDegreeFilter(e) { var v = e.target.value; $('#min-degree-val').textContent = v; filter .undo('min-degree') .nodesBy( function(n, options) { return this.graph.degree(n.id) >= options.minDegreeVal; },{ minDegreeVal: +v }, 'min-degree' ) .apply(); }; $('#min-degree').change(applyMinDegreeFilter); sigma.plugins.locate - Adds the ability to zoom in on a single node or collection of nodes. Very useful if you’re filtering a very large initial graph function locateNode (nid) { if (nid == '') { locate.center(1); } else { locate.nodes(nid); } }; sigma.renderers.glyphs - Allows you to add custom glyphs to each node. Useful if you have many types of node. Outro This application has been a very fun little project to build. The improvements to Sigma wrought by Linkurious have resulted in an incredibly powerful toolkit to rapidly generate graph based applications with a great degree of flexibility and interaction potential. None of this would have been possible were it not for Python. Python is my right (left, I’m left handed) hand which I use for almost everything. Its versatility and expressiveness make it an incredibly robust Swiss army knife in any data-analysts toolkit.
Read more
  • 0
  • 0
  • 29669

article-image-is-golang-truly-community-driven-and-does-it-really-matter
Sugandha Lahoti
24 May 2019
6 min read
Save for later

Is Golang truly community driven and does it really matter?

Sugandha Lahoti
24 May 2019
6 min read
Golang, also called Go, is a statically typed, compiled programming language designed by Google. Golang is going from strength to strength, as more engineers than ever are using it at work, according to Go User Survey 2019. An opinion that has led to the Hacker News community into a heated debate last week: “Go is Google's language, not the community's”. The thread was first started by Chris Siebenmann who works at the Department of Computer Science, University of Toronto. His blog post reads, “Go has community contributions but it is not a community project. It is Google's project.” Chris explicitly states that the community's voice doesn't matter very much for Go's development, and we have to live with that. He argues that Google is the gatekeeper for community contributions; it alone decides what is and isn't accepted into Go. If a developer wants some significant feature to be accepted into Golang, working to build consensus in the community is far less important than persuading the Golang core team. He then cites the example of how one member of Google's Go core team discarded the entire Go Modules system that the Go community had been working on and brought in a relatively radically different model. Chris believes that the Golang team cares about the community and want them to be involved, but only up to a certain point. He wants the Go core team to be bluntly honest about the situation, rather than pretend and implicitly lead people on. He further adds, “Only if Go core team members start leaving Google and try to remain active in determining Go's direction, can we [be] certain Golang is a community-driven language.” He then compares Go with C++, calling the latter a genuine community-driven language. He says there are several major implementations in C++ which are genuine community projects, and the direction of C++ is set by an open standards committee with a relatively distributed membership. https://twitter.com/thatcks/status/1131319904039309312 What is better - community-driven or corporate ownership? There has been an opinion floating around developers about how some open source programming projects are just commercial projects driven mainly by a single company.  If we look at the top open source projects, most of them have some kind of corporate backing ( Apple’s Swift, Oracle’s Java, MySQL, Microsoft’s Typescript, Google’s Kotlin, Golang, Android, MongoDB, Elasticsearch) to name a few. Which brings us to the question, what does corporate ownership of open source projects really mean? A benevolent dictatorship can have two outcomes. If the community for a particular project suggests a change, and in case a change is a bad idea, the corporate team can intervene and stop changes. On the other hand, though, it can actually stop good ideas from the community in being implemented, even if a handful of members from the core team disagree. Chris’s post has received a lot of attention by developers on Hacker News who both sided with and disagreed with the opinion put forward. A comment reads, “It's important to have a community and to work with it, but, especially for a programming language, there has to be a clear concept of which features should be implemented and which not - just accepting community contributions for the sake of making the community feel good would be the wrong way.” Another comment reads, “Many like Go because it is an opinionated language. I'm not sure that a 'community' run language will create something like that because there are too many opinions. Many claims to represent the community, but not the community that doesn't share their opinion. Without clear leaders, I fear technical direction and taste will be about politics which seems more uncertain/risky. I like that there is a tight cohesive group in control over Go and that they are largely the original designers. I might be more interested in alternative government structures and Google having too much control only if those original authors all stepped down.” Rather than splitting between Community or Corporate, a more accurate representation would be how much market value is depending on those projects. If a project is thriving, usually enterprises will take good decisions to handle it. However, another but entirely valid and important question to ask is ‘should open source projects be driven by their market value?’ Another common argument is that the core team’s full-time job is to take care of the language instead of taking errant decisions based on community backlash. Google (or Microsoft, or Apple, or Facebook for that matter) will not make or block a change in a way that kills an entire project. But this does not mean they should sit idly, ignoring the community response. Ideally, the more that a project genuinely belongs to its community, the more it will reflect what the community wants and needs. Google also has a propensity to kill its own products. What happens when Google is not as interested in Golang anymore? The company could leave it to the community to figure out the governance model suddenly by pulling off the original authors into some other exciting new project. Or they may let the authors only work on Golang in their spare time at home or at the weekends. While Google's history shows that many of their dead products are actually an important step towards something better and more successful, why and how much of that logic would be directly relevant to an open source project is something worth thinking about. As a Hacker news user wrote, “Go is developed by Bell Labs people, the same people who bought us C, Unix and Plan 9 (Ken, Pike, RSC, et al). They took the time to think through all their decisions, the impacts of said decisions, along with keeping things as simple as possible. Basically, doing things right the first time and not bolting on features simply because the community wants them.” Another says, “The way how Golang team handles potentially tectonic changes in language is also exemplary – very well communicated ideas, means to provide feedback and clear explanation of how the process works.” Rest assured, if any major change is made to Go, even a drastic one such as killing it, it will not be done without consulting the community and taking their feedback. Go User Survey 2018 results: Golang goes from strength to strength, as more engineers than ever are using it at work. GitHub releases Vulcanizer, a new Golang Library for operating Elasticsearch State of Go February 2019 – Golang developments report for this month released
Read more
  • 0
  • 0
  • 29244

article-image-understanding-go-internals-defer-panic-and-recover-functions-tutorial
Packt Editorial Staff
09 Jul 2018
8 min read
Save for later

Understanding Go Internals: defer, panic() and recover() functions [Tutorial]

Packt Editorial Staff
09 Jul 2018
8 min read
The Go programming language, often referred to as Golang, is making strides with masterclass developments and architecture by the greatest programming minds.  The Go features are extremely handy, and you can use them all the time. However, there is nothing more rewarding than being able to see and understand what is going on in the background and how Go operates behind the scenes. In this article we will learn to use the defer keyword, panic() and recover() functions in Go. This article is extracted from the First Edition of Mastering Go written by Mihalis Tsoukalos. The concepts discussed in this article (and more) have been updated or improved in the third edition of Mastering Go. The defer keyword The defer keyword postpones the execution of a function until the surrounding function returns. It is widely used in file input and output operations because it saves you from having to remember when to close an opened file: the defer keyword allows you to put the function call that closes an opened file near to the function call that opened it. You will also see defer in action in the section that talks about the panic()  and recover() built-in Go functions. It is very important to remember that deferred functions are executed in Last In First Out (LIFO) order after the return of the surrounding function. Put simply, this means that if you defer function f1() first, function f2() second, and function f3() third in the same surrounding function, when the surrounding function is about to return, function f3() will be executed first, function f2() will be executed second, and function f1() will be the last one to get executed. As this definition of defer is a little unclear, I think that you will understand the use of defer a little better by looking at the Go code and the output of the defer.go  program, which will be presented in three parts. The first part of the program follows: package main import (  "fmt" ) func d1() { for i := 3; i > 0; i-- { defer fmt.Print(i, " ") } } Apart from the import block, the preceding Go code implements a function named d1() with a for loop and a defer statement that will be executed three times. The second part of defer.go   contains the following Go code: func d2() { for i := 3; i > 0; i-- { defer func() { fmt.Print(i, " ") }() } fmt.Println() } In this part of the code, you can see the implementation of another function that is named d2(). The d2() function also contains a for loop and a defer statement that will be also executed three times. However, this time the defer keyword is applied to an anonymous function instead of a single fmt.Print() statement. Additionally, the anonymous function takes no parameters. The last part of the Go code follows: func d3() { for i := 3; i > 0; i-- { defer func(n int) { fmt.Print(n, " ") }(i) } } func main() { d1() d2() fmt.Println() d3() fmt.Println() } Apart from the main() function that calls the d1(), d2(), and d3() functions, you can also see the implementation of the d3() function, which has a for loop that uses the defer keyword on an anonymous function. However, this time the anonymous function requires one integer parameter named n. The Go code tells us that the n parameter takes its value from the i variable used in the for loop. Executing defer.go will create the following output: $ go run defer.go 1 2 3 0 0 0 1 2 3 You will most likely find the generated output complicated and challenging to understand. This underscores the fact that the operation and the results of the use of defer can be tricky if your code is not clear and unambiguous. Let's examine the results in order to get a better idea of how tricky defer can be if you do not pay close attention to your code. We will start with the first line of the output (1 2 3), which is generated by the d1() function. The values of i in d1() are 3, 2, and 1 in that order. The function that is deferred in d1() is the fmt.Print() statement. As a result, when the d1() function is about to return, you get the three values of the i variable of the for loop in reverse order because deferred functions are executed in LIFO order. Now, let us inspect the second line of the output that is produced by the d2() function. It is really strange that we got three zeros instead of 1 2 3 in the output. The reason for this, however, is relatively simple. After the for loop has ended, the value of i is 0, because it is that value of i that made the for loop terminate. However, the tricky part here is that the deferred anonymous function is evaluated after the for loop ends, because it has no parameters. This means that is evaluated three times for an i value of 0, hence the generated output! This kind of confusing code is what might lead to the creation of nasty bugs in your projects, so try to avoid it! Last, we will talk about the third line of the output, which is generated by the d3() function. Due to the parameter of the anonymous function, each time the anonymous function is deferred, it gets and uses the current value of i. As a result, each execution of the anonymous function has a different value to process, thus the generated output. After this, it should be clear that the best approach to the use of defer is the third one, which is exhibited in the d3() function. This is so because you intentionally pass the desired variable in the anonymous function in an easy to understand way. Panic and Recover This technique involves the use of the panic() and recover() functions, and it will be presented in panicRecover.go, which you will review in three parts. Strictly speaking, panic() is a built-in Go function that terminates the current flow of a Go program and starts panicking! On the other hand, the recover() function, which is also a built-in Go function, allows you to take back the control of a goroutine that just panicked using panic(). The first part of the program follows: package main import ( "fmt" ) func a() { fmt.Println("Inside a()") defer func() { if c := recover(); c != nil { fmt.Println("Recover inside a()!") } }() fmt.Println("About to call b()") b() fmt.Println("b() exited!") fmt.Println("Exiting a()") } Apart from the import block, this part includes the implementation of the a() function. The most important part of the a() function is the defer block of code, which implements an anonymous function that will be called when there is a call to panic(). The second code segment of panicRecover.go follows next: func b() { fmt.Println("Inside b()") panic("Panic in b()!") fmt.Println("Exiting b()") } The last part of the program, which illustrates the panic() and recover() functions, is as follows: func main() { a() fmt.Println("main() ended!") } Executing panicRecover.go will create the following output: $ go run panicRecover.go Inside a() About to call b() Inside b() Recover inside a()! main() ended! What just happened here is really impressive! However, as you can see from the output, the a() function did not end normally, because its last two statements did not get executed: fmt.Println("b() exited!") fmt.Println("Exiting a()") Nevertheless, the good thing is that panicRecover.go ended according to our will without panicking because the anonymous function used in defer took control of the situation! Also note that function b() knows nothing about function a(). However, function a() contains Go code that handles the panic condition of function b()! Using the panic function on its own You can also use the panic() function on its own without any attempt to recover, and this subsection will show you its results using the Go code of justPanic.go, which will be presented in two parts. The first part of justPanic.go follows next: package main import ( "fmt" "os" ) As you can see, the use of panic() does not require any extra Go packages. The second part of justPanic.go is shown in the following Go code: func main() { if len(os.Args) == 1 { panic("Not enough arguments!") } fmt.Println("Thanks for the argument(s)!") } If your Go program does not have at least one command line argument, it will call the panic() function. The panic() function takes one parameter, which is the error message that you want to print on the screen. Executing justPanic.go on a macOS High Sierra machine will create the following output: $ go run justPanic.go panic: Not enough arguments! goroutine 1 [running]: main.main() /Users/mtsouk/ch2/code/justPanic.go:10 +0x9e exit status 2 Thus, using the panic() function on its own will terminate the Go program without giving you the opportunity to recover! Therefore use of the panic() and recover() pair is much more practical and professional than just using panic() alone. To summarize, we covered some of the interesting Go topics like; defer keyword; the panic() and recover() functions. To explore other major features and packages in Go, get our latest edition in Go programming, Mastering Go, written by Mihalis Tsoukalos. Implementing memory management with Golang’s garbage collector Why is Go the go-to language for cloud native development? – An interview with Mina Andrawos How to build a basic server side chatbot using Go How Concurrency and Parallelism works in Golang [Tutorial]  
Read more
  • 0
  • 0
  • 29173
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-microsoft-mulls-replacing-c-and-c-code-with-rust-calling-it-a-a-modern-safer-system-programming-language-with-great-memory-safety-features
Vincy Davis
18 Jul 2019
3 min read
Save for later

Microsoft mulls replacing C and C++ code with Rust calling it a "modern safer system programming language" with great memory safety features

Vincy Davis
18 Jul 2019
3 min read
Here's another reason why Rust is the present and the future in programming. Few days ago, Microsoft announced that they are going to start exploring Rust and skip their own C languages. This announcement was made by the Principal Security Engineering Manager of Microsoft Security Response Centre (MSRC), Gavin Thomas. Thomas states that ~70% of the vulnerabilities which Microsoft assigns a CVE each year are caused by developers, who accidently insert memory corruption bugs into their C and C++ code. He adds, "As Microsoft increases its code base and uses more Open Source Software in its code, this problem isn’t getting better, it's getting worse. And Microsoft isn’t the only one exposed to memory corruption bugs—those are just the ones that come to MSRC." Image Source: Microsoft blog He highlights the fact that even after having so many security mechanisms (like static analysis tools, fuzzing at scale, taint analysis, many encyclopaedias of coding guidelines, threat modelling guidance, etc) to make a code secure, developers have to invest a lot of time in studying about more tools for training and vulnerability fixes. Thomas states that though C++ has many qualities like fast, mature, small memory and disk footprint, it does not have the memory security guarantee of languages like .NET C#. He believes that Rust is one language, which can provide both the requirements. Thomas strongly advocates that a software security industry should focus on providing a secure environment for developers to work on, rather than turning deaf ear to the importance of security, outdated methods and approaches. He thus concludes by hinting that Microsoft is going to adapt the Rust programming language. As he says that, "Perhaps it's time to scrap unsafe legacy languages and move on to a modern safer system programming language?" Microsoft exploring Rust is not surprising as Rust has been popular with many developers for its simpler syntax, less bugs, memory safe and thread safety. It has also been voted as the most loved programming language, according to the 2019 StackOverflow survey, the biggest developer survey on the internet. It allows developers to focus on their applications, rather than worrying about its security and maintenance. Recently, there have been many applications written in Rust, like Vector, Brave ad-blocker, PyOxidizer and more. Developers couldn't agree more with this post, as all have expressed their love for Rust. https://twitter.com/alilleybrinker/status/1151495738158977024 https://twitter.com/karanganesan/status/1151485485644054528 https://twitter.com/shah_sheikh/status/1151457054004875264 A Redditor says, "While this first post is very positive about memory-safe system programming languages in general and Rust in particular, I would not call this an endorsement. Still, great news!" Visit the Microsoft blog for more details. Introducing Ballista, a distributed compute platform based on Kubernetes and Rust EU Commission opens an antitrust case against Amazon on grounds of violating EU competition rules Fastly CTO Tyler McMullen on Lucet and the future of WebAssembly and Rust [Interview]
Read more
  • 0
  • 0
  • 29036

article-image-how-to-implement-immutability-functions-in-kotlin
Aaron Lazar
27 Jun 2018
8 min read
Save for later

How to implement immutability functions in Kotlin [Tutorial]

Aaron Lazar
27 Jun 2018
8 min read
Unlike Clojure, Haskell, F#, and the likes, Kotlin is not a pure functional programming language, where immutability is forced; rather, we may refer to Kotlin as a perfect blend of functional programming and OOP languages. It contains the major benefits of both worlds. So, instead of forcing immutability like pure functional programming languages, Kotlin encourages immutability, giving it automatic preference wherever possible. In this article, we'll understand the various methods of implementing immutability in Kotlin. This article has been taken from the book, Functional Kotlin, by Mario Arias and Rivu Chakraborty. In other words, Kotlin has immutable variables (val), but no language mechanisms that would guarantee true deep immutability of the state. If a val variable references a mutable object, its contents can still be modified. We will have a more elaborate discussion and a deeper dive on this topic, but first let us have a look at how we can get referential immutability in Kotlin and the differences between var, val, and const val. By true deep immutability of the state, we mean a property will always return the same value whenever it is called and that the property never changes its value; we can easily avoid this if we have a val  property that has a custom getter. You can find more details at the following link: https://artemzin.com/blog/kotlin-val-does-not-mean-immutable-it-just-means-readonly-yeah/ The difference between var and val So, in order to encourage immutability but still let the developers have the choice, Kotlin introduced two types of variables. The first one is var, which is just a simple variable, just like in any imperative language. On the other hand, val brings us a bit closer to immutability; again, it doesn't guarantee immutability. So, what exactly does the val variable provide us? It enforces read-only, you cannot write into a val variable after initialization. So, if you use a val variable without a custom getter, you can achieve referential immutability. Let's have a look; the following program will not compile: fun main(args: Array<String>) { val x:String = "Kotlin" x+="Immutable"//(1) } As I mentioned earlier, the preceding program will not compile; it will give an error on comment (1). As we've declared variable x as val, x will be read-only and once we initialize x; we cannot modify it afterward. So, now you're probably asking why we cannot guarantee immutability with val ? Let's inspect this with the following example: object MutableVal { var count = 0 val myString:String = "Mutable" get() {//(1) return "$field ${++count}"//(2) } } fun main(args: Array<String>) { println("Calling 1st time ${MutableVal.myString}") println("Calling 2nd time ${MutableVal.myString}") println("Calling 3rd time ${MutableVal.myString}")//(3) } In this program, we declared myString as a val property, but implemented a custom get function, where we tweaked the value of myString before returning it. Have a look at the output first, then we will further look into the program: As you can see, the myString property, despite being val, returned different values every time we accessed it. So, now, let us look into the code to understand such behavior. On comment (1), we declared a custom getter for the val property myString. On comment (2), we pre-incremented the value of count and added it after the value of the field value, myString, and returned the same from the getter. So, whenever we requested the myString property, count got incremented and, on the next request, we got a different value. As a result, we broke the immutable behavior of a val property. Compile time constants So, how can we overcome this? How can we enforce immutability? The const val properties are here to help us. Just modify val myString with const val myString and you cannot implement the custom getter. While val properties are read-only variables, const val, on the other hand, are compile time constants. You cannot assign the outcome (result) of a function to const val. Let's discuss some of the differences between val and const val: The val properties are read-only variables, while const val are compile time constants The val properties can have custom getters, but const val cannot We can have val properties anywhere in our Kotlin code, inside functions, as a class member, anywhere, but const val has to be a top-level member of a class/object You cannot write delegates for the const val properties We can have the val property of any type, be it our custom class or any primitive data type, but only primitive data types and String are allowed with a const val property We cannot have nullable data types with the const val properties; as a result, we cannot have null values for the const val properties either As a result, the const val properties guarantee immutability of value but have lesser flexibility and you are bound to use only primitive data types with const val, which cannot always serve our purposes. Now, that I've used the word referential immutability quite a few times, let us now inspect what it means and how many types of immutability there are. Types of immutability There are basically the following two types of immutability: Referential immutability Immutable values Immutable reference  (referential immutability) Referential immutability enforces that, once a reference is assigned, it can't be assigned to something else. Think of having it as a val property of a custom class, or even MutableList or MutableMap; after you initialize the property, you cannot reference something else from that property, except the underlying value from the object. For example, take the following program: class MutableObj { var value = "" override fun toString(): String { return "MutableObj(value='$value')" } } fun main(args: Array<String>) { val mutableObj:MutableObj = MutableObj()//(1) println("MutableObj $mutableObj") mutableObj.value = "Changed"//(2) println("MutableObj $mutableObj") val list = mutableListOf("a","b","c","d","e")//(3) println(list) list.add("f")//(4) println(list) } Have a look at the output before we proceed with explaining the program: So, in this program we've two val properties—list and mutableObj. We initialized mutableObj with the default constructor of MutableObj, since it's a val property it'll always refer to that specific object; but, if you concentrate on comment (2), we changed the value property of mutableObj, as the value property of the MutableObj class is mutable (var). It's the same with the list property, we can add items to the list after initialization, changing its underlying value. Both list and mutableObj are perfect examples of immutable reference; once initialized, the properties can't be assigned to something else, but their underlying values can be changed (you can refer the output). The reason behind that is the data type we used to assign to those properties. Both the MutableObj class and the MutableList<String> data structures are mutable themselves, so we cannot restrict value changes for their instances. Immutable values The immutable values, on the other hand, enforce no change on values as well; it is really complex to maintain. In Kotlin, the const val properties enforce immutability of value, but they lack flexibility (we already discussed them) and you're bound to use only primitive types, which can be troublesome in real-life scenarios. Immutable collections Kotlin gives preference to immutability wherever possible, but leaves the choice to the developer whether or when to use it. This power of choice makes the language even more powerful. Unlike most languages, where they have either only mutable (like Java, C#, and so on) or only immutable collections (like F#, Haskell, Clojure, and so on), Kotlin has both and distinguishes between them, leaving the developer with the freedom to choose whether to use an immutable or mutable one. Kotlin has two interfaces for collection objects—Collection<out E> and MutableCollection<out E>; all the collection classes (for example, List, Set, or Map) implement either of them. As the name suggests, the two interfaces are designed to serve immutable and mutable collections respectively. Let us have an example: fun main(args: Array<String>) { val immutableList = listOf(1,2,3,4,5,6,7)//(1) println("Immutable List $immutableList") val mutableList:MutableList<Int> = immutableList.toMutableList()//(2) println("Mutable List $mutableList") mutableList.add(8)//(3) println("Mutable List after add $mutableList") println("Mutable List after add $immutableList") } The output is as follows: So, in this program, we created an immutable list with the help of the listOf method of Kotlin, on comment (1). The listOf method creates an immutable list with the elements (varargs) passed to it. This method also has a generic type parameter, which can be skipped if the elements array is not empty. The listOf method also has a mutable version—mutableListOf() which is identical except that it returns MutableList instead. We can convert an immutable list to a mutable one with the help of the toMutableList() extension function, we did the same in comment (2), to add an element to it on comment (3). However, if you check the output, the original Immutable List remains the same without any changes, the item is, however, added to the newly created MutableList instead. So now you know how to implement immutability in Kotlin. If you found this tutorial helpful, and would like to learn more, head on over to purchase the full book, Functional Kotlin, by Mario Arias and Rivu Chakraborty. Extension functions in Kotlin: everything you need to know Building RESTful web services with Kotlin Building chat application with Kotlin using Node.js, the powerful Server-side JavaScript platform
Read more
  • 0
  • 0
  • 28976

article-image-9-reasons-why-rust-programmers-love-rust
Richa Tripathi
03 Oct 2018
8 min read
Save for later

9 reasons why Rust programmers love Rust

Richa Tripathi
03 Oct 2018
8 min read
The 2018 survey of the RedMonk Programming Language Rankings marked the entry of a new programming language in their Top 25 list. It has been an incredibly successful year for the Rust programming language in terms of its popularity. It also jumped from the 46th most popular language on GitHub to the 18th position. The Stack overflow survey of 2018 is another indicator of the rise of Rust programming language. Almost 78% of the developers who are working with Rust loved working on it. It topped the list of the most loved programming language among the developers who took the survey for a straight third year in the row. Not only that but it ranked 8th in the most wanted programming language in the survey, which means that the respondent of the survey who has not used it yet but would like to learn. Although, Rust was designed as a low-level language, best suited for systems, embedded, and other performance critical code, it is gaining a lot of traction and presents a great opportunity for web developers and game developers. RUST is also empowering novice developers with the tools to start shipping code fast. So, why is Rust so tempting? Let's explore the high points of this incredible language and understand the variety of features that make it interesting to learn. Automatic Garbage Collection Garbage collection and non-memory resources often create problems with some systems languages. But Rust pays no head to garbage collection and removes the possibilities of failures caused by them. In Rust, garbage collection is completely taken care of by RAII (Resource Acquisition Is Initialization). Better support for Concurrency Concurrency and parallelism are incredibly imperative topics in computer science and are also a hot topic in the industry today. Computers are gaining more and more cores, yet many programmers aren't prepared to fully utilize the power of them. Handling concurrent programming safely and efficiently is another major goal of Rust language. Concurrency is difficult to reason about. In Rust, there is a strong, static type system that helps to reason about your code. As such, Rust also gives you two traits Send and Sync to help you make sense of code that can possibly be concurrent. Rust's standard library also provides a library for threads, which enable you to run Rust code in parallel. You can also use Rust’s threads as a simple isolation mechanism. Error Handling in Rust is beautiful A programmer is bound to make errors, irrespective of the programming language they use. Making errors while programming is normal, but it's the error handling mechanism of that programming language, which enhances the experience of writing the code. In Rust, errors are divided into types: unrecoverable errors and recoverable errors. Unrecoverable errors An error is classified as 'unrecoverable' when there is no other option other than to abort the program. The panic! macro in Rust is very helpful in these cases, especially when a bug has been detected in the code but the programmer is not clear how to handle that error. The panic! macro generates a failure message that helps the user to debug a problem. It also helps to stop the execution before more catastrophic events occur. Recoverable errors The errors which can be handled easily or which do not have a serious impact on the execution of the program are known as recoverable errors. It is represented by the Result<T, E>. The Result<T, E> is an enum that consists of two variants, i.e., OK<T> and Err<E>. It describes the possible error in the program. OK<T>: The 'T' is a type of value which returns the OK variant in the success case. It is an expected outcome. Err<E>: The 'E' is a type of error which returns the ERR variant in the failure. It is an unexpected outcome. Resource Management The one attribute that makes Rust stand out (and completely overpowers Google’s Go for that matter), is the algorithm used for resource management. Rust follows the C++ lead, with concepts like borrowing and mutable borrowing on the plate and thus resource management becomes an elegant process. Furthermore, Rust didn’t need a second chance to know that resource management is not just about memory usage; the fact that they did it right first time makes them a standout performer on this point. Although the Rust documentation does a good job of explaining the technical details, the article by Tim explains the concept in a much friendlier and easy to understand language. As such I thought, it would be good to list his points as well here. The following excerpt is taken from the article written by M.Tim Jones. Reusable code via modules Rust allows you to organize code in a way that promotes its reuse. You attain this reusability by using modules which are nothing but organized code as packages that other programmers can use. These modules contain functions, structures and even other modules that you can either make public, which can be accessed by the users of the module or you can make it private which can be used only within the module and not by the module user. There are three keywords to create modules, use modules, and modify the visibility of elements in modules. The mod keyword creates a new module The use keyword allows you to use the module (expose the definitions into the scope to use them) The pub keyword makes elements of the module public (otherwise, they're private). Cleaner code with better safety checks In Rust, the compiler enforces memory safety and another checking that make the programming language safe. Here, you will never have to worry about dangling pointers or bother using an object after it has been freed. These things are part of the core Rust language that allows you to write clean code. Also, Rust includes an unsafe keyword with which you can disable checks that would typically result in a compilation error. Data types and Collections in Rust Rust is a statically typed programming language, which means that every value in Rust must have a specified data type. The biggest advantage of static typing is that a large class of errors is identified earlier in the development process. These data types can be broadly classified into two types: scalar and compound. Scalar data types represent a single value like integer, floating-point, and character, which are commonly present in other programming languages as well. But Rust also provides compound data types which allow the programmers to group multiple values in one type such as tuples and arrays. The Rust standard library provides a number of data structures which are also called collections. Collections contain multiple values but they are different from the standard compound data types like tuples and arrays which we discussed above. The biggest advantage of using collections is the capability of not specifying the amount of data at compile time which allows the structure to grow and shrink as the program runs. Vectors, Strings, and hash maps are the three most commonly used collections in Rust. The friendly Rust community Rust owes it success to the breadth and depth of engagement of its vibrant community, which supports a highly collaborative process for helping the language to evolve in a truly open-source way. Rust is built from the bottom up, rather than any one individual or organization controlling the fate of the technology. Reliable Robust Release cycles of Rust What is common between Java, Spring, and Angular? They never release their update when they promise to. The release cycle of the Rust community works with clockwork precision and is very reliable. Here’s an overview of the dates and versions: In mid-September 2018, the Rust team released Rust 2018 RC1 version. Rust 2018 is the first major new edition of Rust (after Rust 1.0 released in 2015). This new release would mark the culmination of the last three years of Rust’s development from the core team, and brings the language together in one neat package. This version includes plenty of new features like raw identifiers, better path clarity, new optimizations, and other additions. You can learn more about the Rust language and its evolution at the Rust blog and download from the Rust language website. Note: the headline was edited 09.06.2018 to make it clear that Rust was found to be the most loved language among developers using it. Rust 2018 RC1 now released with Raw identifiers, better path clarity, and other changes Rust as a Game Programming Language: Is it any good? Rust Language Server, RLS 1.0 releases with code intelligence, syntax highlighting and more
Read more
  • 0
  • 3
  • 28344

article-image-delphi-memory-management-techniques-for-parallel-programming
Pavan Ramchandani
19 Jun 2018
31 min read
Save for later

Delphi: memory management techniques for parallel programming

Pavan Ramchandani
19 Jun 2018
31 min read
Memory management is part of practically every computing system. Multiple programs must coexist inside a limited memory space, and that can only be possible if the operating system is taking care of it. When a program needs some memory, for example, to create an object, it can ask the operating system and it will give it a slice of shared memory. When an object is not needed anymore, that memory can be returned to the loving care of the operating system. In this tutorial, we will touch upon memory management techniques, the most prime factor in parallel programming. The article is an excerpt from a book written by Primož Gabrijelčič, titled Delphi High Performance.  Slicing and dicing memory straight from the operating system is a relatively slow operation. In lots of cases, a memory system also doesn't know how to return small chunks of memory. For example, if you call Windows' VirtualAlloc function to get 20 bytes of memory, it will actually reserve 4 KB (or 4,096 bytes) for you. In other words, 4,076 bytes would be wasted. To fix these and other problems, programming languages typically implement their own internal memory management algorithms. When you request 20 bytes of memory, the request goes to that internal memory manager. It still requests memory from the operating system but then splits it internally into multiple parts. In a hypothetical scenario, the internal memory manager would request 4,096 bytes from the operating system and give 20 bytes of that to the application. The next time the application would request some memory (30 bytes for example), the internal memory manager would get that memory from the same 4,096-byte block. To move from hypothetical to specific, Delphi also includes such a memory manager. From Delphi 2006, this memory manager is called FastMM. It was written as an open source memory manager by Pierre LeRiche with help from other Delphi programmers and was later licensed by Borland. FastMM was a great improvement over the previous Delphi memory manager and, although it does not perform perfectly in the parallel programming world, it still functions very well after more than ten years. Optimizing strings and array allocations When you create a string, the code allocates memory for its content, copies the content into that memory, and stores the address of this memory in the string variable. If you append a character to this string, it must be stored somewhere in that memory. However, there is no place to store the string. The original memory block was just big enough to store the original content. The code must, therefore, enlarge that memory block, and only then can the appended character be stored in the newly acquired space A very similar scenario plays out when you extend a dynamic array. Memory that contains the array data can sometimes be extended in place (without moving), but often this cannot be done. If you do a lot of appending, these constant reallocations will start to slow down the code. The Reallocation demo shows a few examples of such behavior and possible workarounds. The first example, activated by the Append String button, simply appends the '*' character to a string 10 million times. The code looks simple, but the s := s + '*' assignment hides a potentially slow string reallocation: procedure TfrmReallocation.btnAppendStringClick(Sender: TObject); var s: String; i: Integer; begin s := ''; for i := 1 to CNumChars do s := s + '*'; end; By now, you probably know that I don't like to present problems that I don't have solutions for and this is not an exception. In this case, the solution is called SetLength. This function sets a string to a specified size. You can make it shorter, or you can make it longer. You can even set it to the same length as before. In case you are enlarging the string, you have to keep in mind that SetLength will allocate enough memory to store the new string, but it will not initialize it. In other words, the newly allocated string space will contain random data. A click on the SetLength String button activates the optimized version of the string appending code. As we know that the resulting string will be CNumChars long, the code can call SetLength(s, CNumChars) to preallocate all the memory in one step. After that, we should not append characters to the string as that would add new characters at the end of the preallocated string. Rather, we have to store characters directly into the string by writing  to s[i]: procedure TfrmReallocation.btnSetLengthClick(Sender: TObject); var s: String; i: Integer; begin SetLength(s, CNumChars); for i := 1 to CNumChars do s[i] := '*'; end; Comparing the speed shows that the second approach is significantly faster. It runs in 33 ms instead of the original 142 ms. A similar situation happens when you are extending a dynamic array. The code triggered by the Append array button shows how an array may be extended by one element at a time in a loop. Admittedly, the code looks very weird as nobody in their right mind would write a loop like this. In reality, however, similar code would be split into multiple longer functions and may be hard to spot: procedure TfrmReallocation.btnAppendArrayClick(Sender: TObject); var arr: TArray<char>; i: Integer; begin SetLength(arr, 0); for i := 1 to CNumChars do begin SetLength(arr, Length(arr) + 1); arr[High(arr)] := '*'; end; end; The solution is similar to the string case. We can preallocate the whole array by calling the SetLength function and then write the data into the array elements. We just have to keep in mind that the first array element always has index 0: procedure TfrmReallocation.btnSetLengthArrayClick(Sender: TObject); var arr: TArray<char>; i: Integer; begin SetLength(arr, CNumChars); for i := 1 to CNumChars do arr[i-1] := '*'; end; Improvements in speed are similar to the string demo. The original code needs 230 ms to append ten million elements, while the improved code executes in 26 ms. The third case when you may want to preallocate storage space is when you are appending to a list. As an example, I'll look into a TList<T> class. Internally, it stores the data in a TArray<T>, so it again suffers from constant memory reallocation when you are adding data to the list. The short demo code appends 10 million elements to a list. As opposed to the previous array demo, this is a completely normal looking code, found many times in many applications: procedure TfrmReallocation.btnAppendTListClick(Sender: TObject); var list: TList<Char>; i: Integer; begin list := TList<Char>.Create; try for i := 1 to CNumChars do list.Add('*'); finally FreeAndNil(list); end; end; To preallocate memory inside a list, you can set the Capacity property to an expected number of elements in the list. This doesn't prevent the list from growing at a later time; it just creates an initial estimate. You can also use Capacity to reduce memory space used for the list after deleting lots of elements from it. The difference between a list and a string or an array is that, after setting Capacity, you still cannot access list[i] elements directly. Firstly you have to Add them, just as if Capacity was not assigned: procedure TfrmReallocation.btnSetCapacityTListClick(Sender: TObject); var list: TList<Char>; i: Integer; begin list := TList<Char>.Create; try list.Capacity := CNumChars; for i := 1 to CNumChars do list.Add('*'); finally FreeAndNil(list); end; end; Comparing the execution speed shows only a small improvement. The original code executed in 167 ms, while the new version needed 145 ms. The reason for that relatively small change is that TList<T> already manages its storage array. When it runs out of space, it will always at least double the previous size. Internal storage therefore grows from 1 to 2, 4, 8, 16, 32, 64, ... elements. This can, however, waste a lot of memory. In our example, the final size of the internal array is 16,777,216 elements, which is about 60% elements too many. By setting the capacity to the exact required size, we have therefore saved 6,777,216 * SizeOf(Char) bytes or almost 13 megabytes. Other data structures also support the Capacity property. We can find it in TList, TObjectList, TInterfaceList, TStrings, TStringList, TDictionary, TObjectDictionary and others. Memory management functions Besides the various internal functions that the Delphi runtime library (RTL) uses to manage strings, arrays and other built-in data types, RTL also implements various functions that you can use in your program to allocate and release memory blocks. In the next few paragraphs, I'll tell you a little bit about them. Memory management functions can be best described if we split them into a few groups, each including functions that were designed to work together. The first group includes GetMem, AllocMem, ReallocMem, and FreeMem. The procedure GetMem(var P: Pointer; Size: Integer) allocates a memory block of size Size and stores an address of this block in a pointer variable P. This pointer variable is not limited to pointer type, but can be of any pointer type (for example PByte). The new memory block is not initialized and will contain whatever is stored in the memory at that time. Alternatively, you can allocate a memory block with a call to the function AllocMem(Size: Integer): Pointer which allocates a memory block, fills it with zeroes, and then returns its address. To change the size of a memory block, call the procedure ReallocMem(var P: Pointer; Size: Integer). Variable P must contain a pointer to a memory block and Size can be either smaller or larger than the original block size. FastMM will try to resize the block in place. If that fails, it will allocate a new memory block, copy the original data into the new block and return an address of the new block in the P. Just as with the GetMem, newly allocated bytes will not be initialized. To release memory allocated in this way, you should call the FreeMem(var P: Pointer) procedure. The second group includes GetMemory, ReallocMemory, and FreeMemory. These three work just the same as functions from the first group, except that they can be used from C++ Builder. The third group contains just two functions, New and Dispose. These two functions can be used to dynamically create and destroy variables of any type. To allocate such a variable, call New(var X: Pointer) where P is again of any pointer type. The compiler will automatically provide the correct size for the memory block and it will also initialize all managed fields to zero. Unmanaged fields will not be initialized. To release such variables, don't use FreeMem but Dispose(var X: Pointer). In the next section, I'll give a short example of using New and Dispose to dynamically create and destroy variables of a record type. You must never use Dispose to release memory allocated with GetMem or AllocateMem. You must also never use FreeMem to release memory allocated with New. The fourth and last group also contains just two functions, Initialize and Finalize. Strictly speaking, they are not memory management functions. If you create a variable containing managed fields (for example, a record) with a function other than New or AllocMem, it will not be correctly initialized. Managed fields will contain random data and that will completely break the execution of the program. To fix that, you should call Initialize(var V) passing in the variable (and not the pointer to this variable!). An example in the next section will clarify that. Before you return such a variable to the memory manager, you should clean up all references to managed fields by calling Finalize(var V). It is better to use Dispose, which will do that automatically, but sometimes that is not an option and you have to do it manually. Both functions also exist in a form that accepts a number of variables to initialize. This form can be used to initialize or finalize an array of data: procedure Initialize(var V; Count: NativeUInt); procedure Finalize(var V; Count: NativeUInt); In the next section, I'll dig deeper into the dynamic allocation of record variables. I'll also show how most of the memory allocation functions are used in practice. Dynamic record allocation While it is very simple to dynamically create new objects—you just call the Create constructor—dynamic allocation of records and other data types (arrays, strings ...) is a bit more complicated. In the previous section, we saw that the preferred way of allocating such variables is with the New method. The InitializeFinalize demo shows how this is done in practice. The code will dynamically allocate a variable of type TRecord. To do that, we need a pointer variable, pointing to TRecord. The cleanest way to do that is to declare a new type PRecord = ^TRecord: type TRecord = record s1, s2, s3, s4: string; end; PRecord = ^TRecord; Now, we can just declare a variable of type PRecord and call New on that variable. After that, we can use the rec variable as if it was a normal record and not a pointer. Technically, we would have to always write rec^.s1, rec^.s4 and so on, but the Delphi compiler is friendly enough and allows us to drop the ^ character: procedure TfrmInitFin.btnNewDispClick(Sender: TObject); var rec: PRecord; begin New(rec); try rec.s1 := '4'; rec.s2 := '2'; rec.s4 := rec.s1 + rec.s2 + rec.s4; ListBox1.Items.Add('New: ' + rec.s4); finally Dispose(rec); end; end; Technically, you could just use rec: ^TRecord instead of rec: PRecord, but it is customary to use explicitly declared pointer types, such as PRecord. Another option is to use GetMem instead of New, and FreeMem instead of Dispose. In this case, however, we have to manually prepare allocated memory for use with a call to Initialize. We must also prepare it to be released with a call to Finalize before we call FreeMem. If we use GetMem for initialization, we must manually provide the correct size of the allocated block. In this case, we can simply use SizeOf(TRecord). We must also be careful with parameters passed to GetMem and Initialize. You pass a pointer (rec) to GetMem and FreeMem and the actual record data (rec^) to Initialize and Finalize: procedure TfrmInitFin.btnInitFinClick(Sender: TObject); var rec: PRecord; begin GetMem(rec, SizeOf(TRecord)); try Initialize(rec^); rec.s1 := '4'; rec.s2 := '2'; rec.s4 := rec.s1 + rec.s2 + rec.s4; ListBox1.Items.Add('GetMem+Initialize: ' + rec.s4); finally Finalize(rec^); FreeMem (rec); end; end; This demo also shows how the code doesn't work correctly if you allocate a record with GetMem, but then don't call Initialize. To test this, click the third button (GetMem). While in actual code the program may sometimes work and sometimes not, I have taken some care so that GetMem will always return a memory block which will not be initialized to zero and the program will certainly fail: It is certainly possible to create records dynamically and use them instead of classes, but one question still remains—why? Why would we want to use records instead of objects when working with objects is simpler? The answer, in one word, is speed. The demo program, Allocate, shows the difference in execution speed. A click on the Allocate objects button will create ten million objects of type TNodeObj, which is a typical object that you would find in an implementation of a binary tree. Of course, the code then cleans up after itself by destroying all those objects: type TNodeObj = class Left, Right: TNodeObj; Data: NativeUInt; end; procedure TfrmAllocate.btnAllocClassClick(Sender: TObject); var i: Integer; nodes: TArray<TNodeObj>; begin SetLength(nodes, CNumNodes); for i := 0 to CNumNodes-1 do nodes[i] := TNodeObj.Create; for i := 0 to CNumNodes-1 do nodes[i].Free; end; A similar code, activated by the Allocate records button creates ten million records of type TNodeRec, which contains the same fields as TNodeObj: type PNodeRec = ^TNodeRec; TNodeRec = record Left, Right: PNodeRec; Data: NativeUInt; end; procedure TfrmAllocate.btnAllocRecordClick(Sender: TObject); var i: Integer; nodes: TArray<PNodeRec>; begin SetLength(nodes, CNumNodes); for i := 0 to CNumNodes-1 do New(nodes[i]); for i := 0 to CNumNodes-1 do Dispose(nodes[i]); end; Running both methods shows a big difference. While the class-based approach needs 366 ms to initialize objects and 76 ms to free them, the record-based approach needs only 76 ms to initialize records and 56 to free them. Where does that big difference come from? When you create an object of a class, lots of things happen. Firstly, TObject.NewInstance is called to allocate an object. That method calls TObject.InstanceSize to get the size of the object, then GetMem to allocate the memory and in the end, InitInstance which fills the allocated memory with zeros. Secondly, a chain of constructors is called. After all that, a chain of AfterConstruction methods is called (if such methods exist). All in all, that is quite a process which takes some time. Much less is going on when you create a record. If it contains only unmanaged fields, as in our example, a GetMem is called and that's all. If the record contains managed fields, this GetMem is followed by a call to the _Initialize method in the System unit which initializes managed fields. The problem with records is that we cannot declare generic pointers. When we are building trees, for example, we would like to store some data of type T in each node. The initial attempt at that, however, fails. The following code does not compile with the current Delphi compiler: type PNodeRec<T> = ^TNodeRec<T>; TNodeRec<T> = record Left, Right: PNodeRec<T>; Data: T; end; We can circumvent this by moving the TNodeRec<T> declaration inside the generic class that implements a tree. The following code from the Allocate demo shows how we could declare such internal type as a generic object and as a generic record: type TTree<T> = class strict private type TNodeObj<T1> = class Left, Right: TNodeObj<T1>; Data: T1; end; PNodeRec = ^TNodeRec; TNodeRec<T1> = record Left, Right: PNodeRec; Data: T1; end; TNodeRec = TNodeRec<T>; end; If you click the Allocate node<string> button, the code will create a TTree<string> object and then create 10 million class-based nodes and the same amount of record-based nodes. This time, New must initialize the managed field Data: string but the difference in speed is still big. The code needs 669 ms to create and destroy class-based nodes and 133 ms to create and destroy record-based nodes. Another big difference between classes and records is that each object contains two hidden pointer-sized fields. Because of that, each object is 8 bytes larger than you would expect (16 bytes in 64-bit mode). That amounts to 8 * 10,000,000 bytes or a bit over 76 megabytes. Records are therefore not only faster but also save space! FastMM internals To get a full speed out of anything, you have to understand how it works and memory managers are no exception to this rule. To write very fast Delphi applications, you should, therefore, understand how Delphi's default memory manager works. FastMM is not just a memory manager—it is three memory managers in one! It contains three significantly different subsystems—small block allocator, medium block allocator, and large block allocator. The first one, the allocator for small blocks, handles all memory blocks smaller than 2,5 KB. This boundary was determined by observing existing applications. As it turned out, in most Delphi applications, this covers 99% of all memory allocations. This is not surprising, as in most Delphi applications most memory is allocated when an application creates and destroys objects and works with arrays and strings, and those are rarely larger than a few hundred characters. Next comes the allocator for medium blocks, which are memory blocks with a size between 2,5 KB and 160 KB. The last one, allocator for large blocks, handles all other requests. The difference between allocators lies not just in the size of memory that they serve, but in the strategy they use to manage memory. The large block allocator implements the simplest strategy. Whenever it needs some memory, it gets it directly from Windows by calling VirtualAlloc. This function allocates memory in 4 KB blocks so this allocator could waste up to 4,095 bytes per request. As it is used only for blocks larger than 160 KB, this wasted memory doesn't significantly affect the program, though. The medium block allocator gets its memory from the large block allocator. It then carves this larger block into smaller blocks, as they are requested by the application. It also keeps all unused parts of the memory in a linked list so that it can quickly find a memory block that is still free. The small block allocator is where the real smarts of FastMM lies. There are actually 56 small memory allocators, each serving only one size of the memory block. The first one serves 8-byte blocks, the next one 16-byte blocks, followed by the allocator for 24, 32, 40, ... 256, 272, 288, ... 960, 1056, ... 2384, and 2608-byte blocks. They all get memory from the medium block allocator. If you want to see block sizes for all 56 allocators, open FastMM4.pas and search for SmallBlockTypes. What that actually means is that each memory allocation request will waste some memory. If you allocate 28 bytes, they'll be allocated from the 32-byte allocator, so 4 bytes will be wasted. If you allocate 250 bytes, they'll come from the 256-byte allocator and so on. The sizes of memory allocators were carefully chosen so that the amount of wasted memory is typically below 10%, so this doesn't represent a big problem in most applications. Each allocator is basically just an array of equally sized elements (memory blocks). When you allocate a small amount of memory, you'll get back one element of an array. All unused elements are connected into a linked list so that the memory manager can quickly find a free element of an array when it needs one. The following image shows a very simplified representation of FastMM allocators. Only two small block allocators are shown. Boxes with thick borders represent allocated memory. Boxes with thin borders represent unused (free) memory. Free memory blocks are connected into linked lists. Block sizes in different allocators are not to scale: FastMM implements a neat trick which helps a lot when you resize strings or arrays by a small amount. Well, the truth be told, I had to append lots and lots of characters—ten million of them—for this difference to show. If I were appending only a few characters, both versions would run at nearly the same speed. If you can, on the other hand, get your hands on a pre-2006 Delphi and run the demo program there, you'll see that the one-by-one approach runs terribly slow. The difference in speed will be of a few more orders of magnitude larger than in my example. The trick I'm talking about assumes that if you had resized memory once, you'll probably want to do it again, soon. If you are enlarging the memory, it will limit the smallest size of the new memory block to be at least twice the size of the original block plus 32 bytes. Next time you'll want to resize, FastMM will (hopefully) just update the internal information about the allocated memory and return the same block, knowing that there's enough space at the end. All that trickery is hard to understand without an example, so here's one. Let's say we have a string of 5 characters which neatly fits into a 24-byte block. Sorry, what am I hearing? "What? Why!? 5 unicode characters need only 10 bytes!" Oh, yes, strings are more complicated than I told you before. In reality, each Delphi UnicodeString and AnsiString contains some additional data besides the actual characters that make up the string. Parts of the string are also: 4-byte length of string, 4-byte reference count, 2-byte field storing the size of each string character (either 1 for AnsiString or 2 for UnicodeString), and 2-byte field storing the character code page. In addition to that, each string includes a terminating Chr(0) character. For a 5-character string this gives us 4 (length) + 4 (reference count) + 2 (character size) + 2 (codepage) + 5 (characters) * 2 (size of a character) + 2 (terminating Chr(0)) = 24 bytes. When you add one character to this string, the code will ask the memory manager to enlarge a 24-byte block to 26 bytes. Instead of returning a 26-byte block, FastMM will round that up to 2 * 24 + 32 = 80 bytes. Then it will look for an appropriate allocator, find one that serves 80-byte blocks (great, no memory loss!) and return a block from that allocator. It will, of course, also have to copy data from the original block to the new block. This formula, 2 * size + 32, is used only in small block allocators. A medium block allocator only overallocates by 25%, and a large block allocator doesn't implement this behavior at all. Next time you add one character to this string, FastMM will just look at the memory block, determine that there's still enough space inside this 80-byte memory block and return the same memory. This will continue for quite some time while the block grows to 80 bytes in two-byte increments. After that, the block will be resized to 2 * 80 + 32 = 192 bytes (yes, there is an allocator for this size), data will be copied and the game will continue. This behavior indeed wastes some memory but, under most circumstances, significantly boosts the speed of code that was not written with speed in mind. Memory allocation in a parallel world We've seen how FastMM boosts the reallocation speed. The life of a memory manager is simple when there is only one thread of execution inside a program. When the memory manager is dealing out the memory, it can be perfectly safe in the knowledge that nothing can interrupt it in this work. When we deal with parallel processing, however, multiple paths of execution simultaneously execute the same program and work on the same data. Because of that, life from the memory manager's perspective suddenly becomes very dangerous. For example, let's assume that one thread wants some memory. The memory manager finds a free memory block on a free list and prepares to return it. At that moment, however, another thread also needs some memory from the same allocator. This second execution thread (running in parallel with the first one) would also find a free memory block on the free list. If the first thread didn't yet update the free list, that may even be the same memory block! That can only result in one thing—complete confusion and crashing programs. It is extremely hard to write a code that manipulates some data structures (such as a free list) in a manner that functions correctly in a multithreaded world. So hard that FastMM doesn't even try it. Instead of that, it regulates access to each allocator with a lock. Each of the 56 small block allocators get their own lock, as do medium and large block allocators. When a program needs some memory from, say, a 16-byte allocator, FastMM will lock this allocator until the memory is returned to the program. If during this time, another thread requests a memory from the same 16-byte allocator, it will have to wait until the first thread finishes. This indeed fixes all problems but introduces a bottleneck—a part of the code where threads must wait to be processed in a serial fashion. If threads do lots of memory allocation, this serialization will completely negate the speed-up that we expected to get from the parallel approach. Such a memory manager would be useless in a parallel world. To fix that, FastMM introduces memory allocation optimization which only affects small blocks. When accessing a small block allocator, FastMM will try to lock it. If that fails, it will not wait for the allocator to become unlocked but will try to lock the allocator for the next block size. If that succeeds, it will return memory from the second allocator. That will indeed waste more memory but will help with the execution speed. If the second allocator also cannot be locked, FastMM will try to lock the allocator for yet the next block size. If the third allocator can be locked, you'll get back memory from it. Otherwise, FastMM will repeat the process from the beginning. This process can be somehow described with the following pseudo-code: allocIdx := find best allocator for the memory block repeat if can lock allocIdx then break; Inc(allocIdx); if can lock allocIdx then break; Inc(allocIdx); if can lock allocIdx then break; Dec(allocIdx, 2) until false allocate memory from allocIdx allocator unlock allocIdx A careful reader would notice that this code fails when the first line finds the last allocator in the table or the one before that. Instead of adding some conditional code to work around the problem, FastMM rather repeats the last allocator in the list three times. The table of small allocators actually ends with the following sizes: 1,984; 2,176; 2,384; 2,608; 2,608; 2,608. When requesting a block size above 2,384 the first line in the pseudo-code above will always find the first 2,608 allocator, so there will always be two more after it. This approach works great when memory is allocated but hides another problem. And how can I better explain a problem than with a demonstration ...? An example of this problem can be found in the program, ParallelAllocations. If you run it and click the Run button, the code will compare the serial version of some algorithm with a parallel one. I'm aware that I did not explain parallel programming at all, but the code is so simple that even somebody without any understanding of the topic will guess what it does. The core of a test runs a loop with the Execute method on all objects in a list. If a parallelTest flag is set, the loop is executed in parallel, otherwise, it is executed serially. The only mystery part in the code, TParallel.For does exactly what it says—executes a for loop in parallel. if parallelTest then TParallel.For(0, fList.Count - 1, procedure(i: integer) begin fList[i].Execute; end) else for i := 0 to fList.Count - 1 do fList[i].Execute; If you'll be running the program, make sure that you execute it without the debugger (Ctrl + Shift + F9 will do that). Running with the debugger slows down parallel execution and can skew the measurements. On my test machine I got the following results: In essence, parallelizing the program made it almost 4 times faster. Great result! Well, no. Not a great result. You see, the machine I was testing on has 12 cores. If all would be running in parallel, I would expect an almost 12x speed-up, not a mere 4-times improvement! If you take a look at the code, you'll see that each Execute allocates a ton of objects. It is obvious that a problem lies in the memory manager. The question remains though, where exactly lies this problem and how can we find it? I ran into exactly the same problem a few years ago. A highly parallel application which processes gigabytes and gigabytes of data was not running fast enough. There were no obvious problematic points and I suspected that the culprit was FastMM. I tried swapping the memory manager for a more multithreading-friendly one and, indeed, the problem was somehow reduced but I still wanted to know where the original sin lied in my code. I also wanted to continue using FastMM as it offers great debugging tools. In the end, I found no other solution than to dig in the FastMM internals, find out how it works, and add some logging there. More specifically, I wanted to know when a thread is waiting for a memory manager to become unlocked. I also wanted to know at which locations in my program this happens the most. To cut a (very) long story short, I extended FastMM with support for this kind of logging. This extension was later integrated into the main FastMM branch. As these changes are not included in Delphi, you have to take some steps to use this code. Firstly, you have to download FastMM from the official repository at https://github.com/pleriche/FastMM4. Then you have to unpack it somewhere on the disk and add FastMM4 as a first unit in the project file (.dpr). For example, the ParallelAllocation program starts like this: program ParallelAllocation; uses FastMM4 in 'FastMM\FastMM4.pas', Vcl.Forms, ParallelAllocationMain in 'ParallelAllocationMain.pas' {frmParallelAllocation}; When you have done that, you should firstly rebuild your program and test if everything is still working. (It should but you never know ...) To enable the memory manager logging, you have to define a conditional symbol LogLockContention, rebuild (as FastMM4 has to be recompiled) and, of course, run the program without the debugger. If you do that, you'll see that the program runs quite a bit slower than before. On my test machine, the parallel version was only 1.6x faster than the serial one. The logging takes its toll, but that is not important. The important part will appear when you close the program. At that point, the logger will collect all results and sort them by frequency. The 10 most frequent sources of locking in the program will be saved to a file called <programname>_MemoryManager_EventLog.txt. You will find it in the folder with the <programname>.exe. The three most frequent sources of locking will also be displayed on the screen. The following screenshot shows a cropped version of this log. Some important parts are marked out: For starters, we can see that at this location the program waited 19,020 times for a memory manager to become unlocked. Next, we can see that the memory function that caused the problem was FreeMem. Furthermore, we can see that somebody tried to delete from a list (InternalDoDelete) and that this deletion was called from TSpeedTest.Execute, line 130. FreeMem was called because the list in question is actually a TObjectList and deleting elements from the list caused it to be destroyed. The most important part here is the memory function causing the problem—FreeMem. Of course! Allocations are optimized. If an allocator is locked, the next one will be used and so on. Releasing memory, however, is not optimized! When we release a memory block, it must be returned to the same allocator that it came from. If two threads want to release memory to the same allocator at the same time, one will have to wait. I had an idea on how to improve this situation by adding a small stack (called release stack) to each allocator. When FreeMem is called and it cannot lock the allocator, the address of the memory block that is to be released will be stored on that stack. FreeMem will then quickly exit. When a FreeMem successfully locks an allocator, it firstly releases its own memory block. Then it checks if anything is waiting on the release stack and releases these memory blocks too (if there are any). This change is also included in the main FastMM branch, but it is not activated by default as it increases the overall memory consumption of the program. However, in some situations, it can do miracles and if you are developing multithreaded programs you certainly should test it out. To enable release stacks, open the project settings for the program, remove the conditional define LogLockContention (as that slows the program down) and add the conditional define UseReleaseStack. Rebuild, as FastMM4.pas has to be recompiled. On my test machine, I got much better results with this option enabled. Instead of a 3,9x speed-up, the parallel version was 6,3x faster than the serial one. The factor is not even close to 12x, as the threads do too much fighting for the memory, but the improvement is still significant: That is as far as FastMM will take us. For a faster execution, we need a more multithreading-friendly memory manager. To summarize, this article covered memory management techniques offered by Delphi. We looked into optimization, allocation, and internal of storage for efficient parallel programming. If you found this post useful, do check out the book Delphi High Performance to learn more about the intricacies of how to perform High-performance programming with Delphi. Read More: Exploring the Usages of Delphi Network programming 101 with GAWK (GNU AWK) A really basic guide to batch file programming
Read more
  • 0
  • 0
  • 27785
article-image-node-js-13-releases-upgraded-v8-full-icu-support-stable-worker-threads-api
Fatema Patrawala
23 Oct 2019
4 min read
Save for later

Node.js 13 releases with an upgraded V8, full ICU support, stable Worker Threads API and more

Fatema Patrawala
23 Oct 2019
4 min read
Yesterday was a super exciting day for Node.js developers as Node.js foundation announced of Node.js 12 transitions to Long Term Support (LTS) with the release of Node.js 13. As per the team, Node.js 12 becomes the newest LTS release along with version 10 and 8. This release marks the transition of Node.js 12.x into LTS with the codename 'Erbium'. The 12.x release line now moves into "Active LTS" and will remain so until October 2020. Then it will move into "Maintenance" until the end of life in April 2022. The new Node.js 13 release will deliver faster startup and better default heap limits. It includes updates to V8, TLS and llhttp and new features like diagnostic report, bundled heap dump capability and updates to Worker Threads, N-API, and more. Key features in Node.js 13 Let us take a look at the key features included in Node.js 13. V8 gets an upgrade to V8 7.8 This release is compatible with the new version V8 7.8. This new version of the V8 JavaScript engine brings performance tweaks and improvements to keep Node.js up with the ongoing improvements in the language and runtime. Full ICU enabled by default in Node.js 13 As of Node.js 13, full-icu is now available as default, which means hundreds of other local languages are now supported out of the box. This will simplify development and deployment of applications for non-English deployments. Stable workers API Worker Threads API is now a stable feature in both Node.js 12 and Node.js 13. While Node.js already performs well with the single-threaded event loop, there are some use-cases where additional threads can be leveraged for better results. New compiler and platform support Node.js and V8 continue to embrace newer C++ features and take advantage of newer compiler optimizations and security enhancements. With the release of Node.js 13, the codebase will now require a minimum of version 10 for the OS X development tools and version 7.2 of the AIX operating system. In addition to this there has been progress on supporting Python 3 for building Node.js applications. Systems that have Python 2 and Python 3 installed will still be able to use Python 2, however, systems with only Python 3 should now be able to build using Python 3. Developers discuss pain points in Node.js 13 On Hacker News, users discuss various pain-points in Node.js 13 and some of the functionalities missing in this release. One of the users commented, “To save you the clicks: Node.js 13 doesn't support top-level await. Node includes V8 7.8, released Sep 27. Top-level await merged into V8 on Sep 24, but didn't make it in time for the 7.8 release.” Response on this comment came in from V8 team, they say, “TLA is only in modules. Once node supports modules, it will also have TLA. We're also pushing out a version with 7.9 fairly soonish.” Other users discussed how Node.js performs with TypeScript, “I've been using node with typescript and it's amazing. VERY productive. The key thing is you can do a large refactoring without breaking anything. The biggest challenge I have right now is actually the tooling. Intellij tends to break sometimes. I'm using lerna for a monorepo with sub-modules and it's buggy with regular npm. For example 'npm audit' doesn't work. I might have to migrate to yarn…” If you are interested to know more about this release, check out the official Node.js blog post as well as the GitHub page for release notes. The OpenJS Foundation accepts NVM as its first new incubating project since the Node.js Foundation and JSF merger 12 Visual Studio Code extensions that Node.js developers will love [Sponsored by Microsoft] 5 reasons Node.js developers might actually love using Azure [Sponsored by Microsoft] Introducing Node.js 12 with V8 JavaScript engine, improved worker threads, and much more Google is planning to bring Node.js support to Fuchsia
Read more
  • 0
  • 0
  • 27650

article-image-python-3-8-new-features-the-walrus-operator-positional-only-parameters-and-much-more
Bhagyashree R
18 Jul 2019
5 min read
Save for later

Python 3.8 new features: the walrus operator, positional-only parameters, and much more

Bhagyashree R
18 Jul 2019
5 min read
Earlier this month, the team behind Python announced the release of Python 3.8b2, the second of four planned beta releases. Ahead of the third beta release, which is scheduled for 29th July, we look at some of the key features coming to Python 3.8. The "incredibly controversial" walrus operator The walrus operator was proposed in PEP 572 (Assignment Expressions) by Chris Angelico, Tim Peters, and Guido van Rossum last year. Since then it has been heavily discussed in the Python community with many questioning whether it is a needed improvement. Others were excited as the operator does make the code a tiny bit more readable. At the end of the PEP discussion, Guido van Rossum stepped down as BDFL (benevolent dictator for life) and the creation of a new governance model. In an interview with InfoWorld, Guido shared, “The straw that broke the camel’s back was a very contentious Python enhancement proposal, where after I had accepted it, people went to social media like Twitter and said things that really hurt me personally. And some of the people who said hurtful things were actually core Python developers, so I felt that I didn’t quite have the trust of the Python core developer team anymore.” According to PEP 572, the assignment expression is a syntactical operator that allows you to assign values to a variable as a part of an expression. Its aim is to simplify things like multiple-pattern matches and the so-called loop and a half. At PyCon 2019, Dustin Ingram, a PyPI maintainer, gave a few examples where you can use this syntax: Balancing lines of codes and complexity Avoiding inefficient comprehensions Avoiding unnecessary variables in scope You can watch the full talk on YouTube: https://www.youtube.com/watch?v=6uAvHOKofws The feature was implemented by Emily Morehouse, Python core developer and Founder, Director of Engineering at Cuttlesoft, and was merged earlier this year: https://twitter.com/emilyemorehouse/status/1088593522142339072 Explaining other improvements this feature brings, Jake Edge, a contributor on LWN.net wrote, “These and other uses (e.g. in list and dict comprehensions) help make the intent of the programmer clearer. It is a feature that many other languages have, but Python has, of course, gone without it for nearly 30 years at this point. In the end, it is actually a fairly small change for all of the uproars it caused.” Positional-only parameters Proposed in PEP 570, this introduces a new syntax (/) to specify positional-only parameters in Python function definitions. This is similar to how * indicates that the arguments to its right are keyword only. This syntax is already used by many CPython built-in and standard library functions, for instance, the pow() function: pow(x, y, z=None, /) This syntax gives library authors more control over better expressing the intended usage of an API and allows the API to “evolve in a safe, backward-compatible way.”  It gives library authors the flexibility to change the name of positional-only parameters without breaking callers. Additionally, this also ensures consistency of the Python language with existing documentation and the behavior of various  "builtin" and standard library functions. As with PEP 572, this proposal also got mixed reactions from Python developers. In support, one developer said, “Position-only parameters already exist in cpython builtins like range and min. Making their support at the language level would make their existence less confusing and documented.” While others think that this will allow authors to “dictate” how their methods could be used. “Not the biggest fan of this one because it allows library authors to overly dictate how their functions can be used, as in, mark an argument as positional merely because they want to. But cool all the same,” a Redditor commented. Debug support for f-strings Formatted strings (f-strings) were introduced in Python 3.6 with PEP 498. It enables you to evaluate an expression as part of the string along with inserting the result of function calls and so on. In Python 3.8, some additional syntax changes have been made by adding add (=) specifier and a !d conversion for ease of debugging. You can use this feature like this: print(f'{foo=} {bar=}') This provides developers a better way of doing “print-style debugging”, especially for those who have a background in languages that already have such feature such as  Perl, Ruby, JavaScript, etc. One developer expressed his delight on Hacker News, “F strings are pretty awesome. I’m coming from JavaScript and partially java background. JavaScript’s String concatenation can become too complex and I have difficulty with large strings.” Python Initialization Configuration Though Python is highly configurable, its configuration seems scattered all around the code.  The PEP 587 introduces a new C API to configure the Python Initialization giving developers finer control over the configuration and better error reporting. Among the improvements, this API will bring include ability to read and modify configuration before it is applied and overriding how Python computes the module search paths (``sys.path``). Along with these, there are many other exciting features coming to Python 3.8, which is currently scheduled for October, including a fast calling protocol for CPython, Vectorcall, support for out-of-band buffers in pickle protocol 5, and more. You can find the full list on Python’s official website. Python serious about diversity, dumps offensive ‘master’, ‘slave’ terms in its documentation Introducing PyOxidizer, an open source utility for producing standalone Python applications, written in Rust Python 3.8 beta 1 is now ready for you to test  
Read more
  • 0
  • 0
  • 26970

article-image-mozilla-shares-plans-to-bring-desktop-applications-games-to-webassembly-and-make-deeper-inroads-for-the-future-web
Prasad Ramesh
23 Oct 2018
10 min read
Save for later

Mozilla shares plans to bring desktop applications, games to WebAssembly and make deeper inroads for the future web

Prasad Ramesh
23 Oct 2018
10 min read
WebAssembly defines an Abstract Syntax Tree (AST) in a binary format and a corresponding assembly-like text format for executable code in Web pages. It can be considered as a new language or a web standard. You can create and debug code in plain text format. It appeared in browsers last year, but that was just a barebones version. Many new features are to be added that could transform what you can do with WebAssembly. The minimum viable product (MVP) WebAssembly started with Emscripten, a toolchain. It made C++ code run on the web by transpiling it to JavaScript. But the automatically generated JS was still significantly slower than the native code. Mozilla engineers found a type system hidden in the generated JS. They figured out how to make this JS run really fast, which is now called asm.js. This was not possible in JavaScript itself, and a new language was needed, designed specifically to be compiled to. Thus was born WebAssembly. Now we take a look at what was needed to get the MVP of WebAssembly running. Compile target: A language agnostic compile target to support more languages than just C and C++. Fast execution: The compiler target had to be designed fast in order to keep up with user expectations of smooth interactions. Compact: A compact compiler target to be able to fit and quickly load pages. Pages with large code bases of web apps/desktop apps ported to the web. Linear memory: A linear model is used to give access to specific parts of memory and nothing else. This is implemented using TypedArrays, similar to a JavaScript array except that it only contains bytes of memory. This was the MVP vision of WebAssembly. It allowed many different kinds of desktop applications to work on your browser without compromising on speed. Heavy desktop applications The next achievement is to run heavyweight desktop applications on the browser. Something like Photoshop or Visual Studio. There are already some implementations of this, Autodesk AutoCAD and Adobe Lightroom. Threading: To support the use of multiple cores of modern CPUs. A proposal for threading is almost done. SharedArrayBuffers, an important part of threading had to be turned off this year due to Spectre vulnerability, they will be turned on again. SIMD: Single instruction multiple data (SIMD) enables to take a chunk of memory and split it up across different execution units/cores. It is under active development. 64-bit addressing: 32-bit memory addresses only allow 4GB of linear memory to store addresses. 64-bit gives 16 exabytes of memory addresses. The approach to incorporate this will be similar to how x86 or ARM added support for 64-bit addressing. Streaming compilation: Streaming compilation is to compile a WebAssembly file while still being downloaded. This allows very fast compilation resulting in faster web applications. Implicit HTTP caching: The compiled code of a web page in a web application is stored in HTTP cache and is reused. So compiling is skipped for any page already visited by you. Other improvements: These are upcoming discussion on how to even better the load time. Once these features are implemented, even heavier apps can run on the browser. Small modules interoperating with JavaScript In addition to heavy applications and games, WebAssembly is also for regular web development. Sometimes, small modules in an app do a lot of the work. The intent is to make it easier to port these modules. This is already happening with heavy applications, but for widespread use a few more things need to be in place. Fast calls between JS and WebAssembly: Integrating a small module will need a lot of calls between JS and WebAssembly, the goal is to make these calls faster. In the MVP the calls weren’t fast. They are fast in Firefox, other browsers are also working on it. Fast and easy data exchange: With calling JS and WebAssembly frequently, data also needs to be passed between them. The challenge being WebAssembly only understand numbers, passing complex values is difficult currently. The object has to be converted into numbers, put in linear memory and pass WebAssembly the location in linear memory. There are many proposals underway. The most notable being the Rust ecosystem that has created tools to automate this. ESM integration: The WebAssembly module isn’t actually a part of the JS module graph. Currently, developers instantiate a WebAssembly module by using an imperative API. Ecmascript module integration is necessary to use import and export with JS. The proposals have made progress, work with other browser vendors is initiated. Toolchain integration: There needs to be a place to distribute and download the modules and the tools to bundle them. While there is no need for a seperate ecosystem, the tools do need to be integrated. There are tools like the wasm-pack to automatically run things. Backwards compatibility: To support older versions of browsers, even the versions that were present before WebAssembly came into picture. This is to help developers avoid writing another implementation for adding support to an old browser. There’s a wasm2js tool that takes a wasm file and outputs JS, it is not going to be as fast, but will work with older versions. The proposal for Small modules in WebAssembly is close to being complete, and on completion it will open up the path for work on the following areas. JS frameworks and compile-to-JS languages There are two use cases: To rewrite large parts of JavaScript frameworks in WebAssembly. Statically-typed compile-to-js languages being compiled to WebAssembly instead of JS For this to happen, WebAssembly needs to support high-level language features. Garbage collector: Integration with the browser’s garbage collector. The reason is to speed things up by working with components managed by the JS VM. Two proposals are underway, should be incorporated sometime next year. Exception handling: Better support for exception handling to handle the exceptions and actions from different languages. C#, C++, JS use exceptions extensively. It is under the R&D phase. Debugging: The same level of debugging support as JS and compile-to-JS languages. There is support in browser devtools, but is not ideal. A subgroup of the WebAssembly CG are working on it. Tail calls: Functional languages support this. It allows calling a new function without adding a new stack frame to the stack. There is a proposal underway. Once these are in place, JS frameworks and many compile-to-JS languages will be unlocked. Outside the browser This refers to everything that happens in systems/places other than your local machine. A really important part is the link, a very special kind of link. The special thing about this link is that people can link to pages without having to put them in any central registry, with no need of asking who the person is, etc. It is this ease of linking that formed global communities. However, there are two unaddressed problems. Problem #1: How does a website know what code to deliver to your machine depending on the OS device you are using? It is not practical to have different versions of code for every device possible. The website has only one code, the source code which is translated to the user’s machine. With portability, you can load code from unknown people while not knowing what kind of device are they using. This brings us to the second problem. Problem #2: If the people whose web pages you load are not known, there comes the question of trust. The code from a web page can contain malicious code. This is where security comes into picture. Security is implemented at the browser level and filters out malicious content if detected. This makes you think of WebAssembly as just another tool in the browser toolbox which it is. Node.js WebAssembly can bring full portability to Node.js. Node gives most of the portability of JavaScript on the web. There are cases where performance needs to be improved which can be done via Node’s native modules. These modules are written in languages such as C. If these native modules were written in WebAssembly, they wouldn’t need to be compiled specifically for the target architecture. Full portability in Node would mean the exact same Node app running across different kinds of devices without needing to compile. But this is not possible currently as WebAssembly does not have direct access to the system’s resources. Portable interface The Node core team would have to figure out the set of functions to be exposed and the API to use. It would be nice if this was something standard, not just specific to Node. If done right, the same API could be implemented for the web. There is a proposal called package name maps providing a mechanism to map a module name to a path to load the module from. This looks likely to happen and will unlock other use cases. Other use cases of outside the browser Now let’s look at the other use cases of outside the browser. CDNs, serverless, and edge computing The code to your website resides in a server maintained by a service provider. They maintain the server and make sure the code is close to all the users of your website. Why use WebAssembly in these cases? Code in a process doesn’t have boundaries. Functions have access to all memory in that process and they can call any functions. On running different services from different people, this is a problem. To make this work, a runtime needs to be created. It takes time and effort to do this. A common runtime that could be used across different use cases would speed up development. There is no standard runtime for this yet, however, some runtime projects are underway. Portable CLI tools There are efforts to get WebAssembly used in more traditional operating systems. When this happens, you can use things like portable CLI tools used across different kinds of operating systems. Internet of Things Smaller IoT devices like wearables etc are small and have resource constraints. They have small processors and less memory. What would help in this situation is a compiler like Cranelift and a runtime like wasmtime. Many of these devices are also different from one another, portability would address this issue. Clearly, the initial implementation of WebAssembly was indeed just an MVP and there are many more improvements underway to make it faster and better. Will WebAssembly succeed in dominating all forms of software development? For in depth information with diagrams, visit the Mozilla website. Ebiten 1.8, a 2D game library in Go, is here with experimental WebAssembly support and newly added APIs Testing WebAssembly modules with Jest [Tutorial] Mozilla optimizes calls between JavaScript and WebAssembly in Firefox, making it almost as fast as JS to JS calls
Read more
  • 0
  • 0
  • 26654
article-image-rust-1-39-releases-with-stable-version-of-async-await-syntax-better-ergonomics-for-match-guards-attributes-on-function-parameters-and-more
Vincy Davis
08 Nov 2019
4 min read
Save for later

Rust 1.39 releases with stable version of async-await syntax, better ergonomics for match guards, attributes on function parameters, and more

Vincy Davis
08 Nov 2019
4 min read
Less than two months after announcing Rust 1.38, the Rust team announced the release of Rust 1.39 yesterday. The new release brings the stable version of the async-await syntax, which will allow users to not only define async functions, but also block and .await them. The other improvements in Rust 1.39 include shared references to by-move bindings in match guards and attributes on function parameters. The stable version of async-await syntax The stable async function can be utilized (by writing async fn instead of fn) to return a Future when called. A Future is a suspended computation which is used to drive a function to conclusion “by .awaiting it.” Along with async fn, the async { ... } and async move { ... } blocks can also be used to define async literals. According to Nicholas D. Matsakis, a member of the release team, the first stable support of async-await kicks-off the commencement of a “Minimum Viable Product (MVP)”, as the Rust team will now try to improve the syntax by polishing and extending it for future operations. “With this stabilization, we hope to give important crates, libraries, and the ecosystem time to prepare for async /.await, which we'll tell you more about in the future,” states the official Rust blog. Some of the major developments in the async ecosystem The tokio runtime will be releasing a number of scheduler improvements with support to async-await syntax in this month. The async-std runtime library will be releasing their first stable release in a few days. The async-await support has already started to become available in higher-level web frameworks and other applications like the futures_intrusive crate. Other improvements in Rust 1.39 Better ergonomics for match guards In the earlier versions, Rust would disallow taking shared references to by-move bindings in the if guards of match expressions. Starting from Rust 1.39, the compiler will allow binding in the following two ways- by-reference: either immutably or mutably which can be achieved through ref my_var or ref mut my_var respectively. by-value: either by-copy, if the bound variable's type implements Copy or otherwise by-move. The Rust team hopes that this feature will give developers a smoother and consistent experience with expressions. Attributes on function parameters Unlike the previous versions, Rust 1.39 will enable three types of attributes on parameters of functions, closures, and function pointers. Conditional compilation: cfg and cfg_attr Controlling lints: allow, warn, deny, and forbid Helper attributes which are used for procedural macro attributes Many users are happy with the Rust 1.39 features and are especially excited about the stable version of async-await syntax. A user on Hacker News comments, “Async/await lets you write non-blocking, single-threaded but highly interweaved firmware/apps in allocation-free, single-threaded environments (bare-metal programming without an OS). The abstractions around stack snapshots allow seamless coroutines and I believe will make rust pretty much the easiest low-level platform to develop for.” Another comment read, “This is big! Turns out that syntactic support for asynchronous programming in Rust isn't just syntactic: it enables the compiler to reason about the lifetimes in asynchronous code in a way that wasn't possible to implement in libraries. The end result of having async/await syntax is that async code reads just like normal Rust, which definitely wasn't the case before. This is a huge improvement in usability.” Few have already upgraded to Rust 1.39 and shared their feedback on Twitter. https://twitter.com/snoyberg/status/1192496806317481985 Check out the official announcement for more details. You can also read the blog on async-await for more information. AWS will be sponsoring the Rust Project A Cargo vulnerability in Rust 1.25 and prior makes it ignore the package key and download a wrong dependency Fastly announces the next-gen edge computing services available in private beta Neo4j introduces Aura, a new cloud service to supply a flexible, reliable and developer-friendly graph database Yubico reveals Biometric YubiKey at Microsoft Ignite
Read more
  • 0
  • 0
  • 26260

article-image-npm-inc-co-founder-and-chief-data-officer-quits-leaving-the-community-to-question-the-stability-of-the-javascript-registry
Fatema Patrawala
22 Jul 2019
6 min read
Save for later

Npm Inc. co-founder and Chief data officer quits, leaving the community to question the stability of the JavaScript Registry

Fatema Patrawala
22 Jul 2019
6 min read
On Thursday, The Register reported that Laurie Voss, the co-founder and chief data officer of JavaScript package registry, NPM Inc left the company. Voss’s last day in office was 1st July while he officially announced the news on Thursday. Voss joined NPM in January 2014 and decided to leave the company in early May this year. NPM has faced its share of unrest in the company in the past few months. In the month of March  5 NPM employees were fired from the company in an unprofessional and unethical way. Later 3 of those employees were revealed to have been involved in unionization and filed complaints against NPM Inc with the National Labor Relations Board (NLRB).  Earlier this month NPM Inc at the third trial settled the labor claims brought by these three former staffers through the NLRB. Voss’ s resignation will be third in line after Rebecca Turner, former core contributor who resigned in March and Kat Marchan, former CLI and community architect who resigned from NPM early this month. Voss writes on his blog, “I joined npm in January of 2014 as co-founder, when it was just some ideals and a handful of servers that were down as often as they were up. In the following five and a half years Registry traffic has grown over 26,000%, and worldwide users from about 1 million back then to more than 11 million today. One of our goals when founding npm Inc. was to make it possible for the Registry to run forever, and I believe we have achieved that goal. While I am parting ways with npm, I look forward to seeing my friends and colleagues continue to grow and change the JavaScript ecosystem for the better.” Voss also told The Register that he supported unions, “As far as the labor dispute goes, I will say that I have always supported unions, I think they're great, and at no point in my time at NPM did anybody come to me proposing a union,” he said. “If they had, I would have been in favor of it. The whole thing was a total surprise to me.” The Register team spoke to one of the former staffers of NPM and they said employees tend not to talk to management in the fear of retaliation and Voss seemed uncomfortable to defend the company’s recent actions and felt powerless to affect change. In his post Voss is optimistic about NPM’s business areas, he says, “Our paid products, npm Orgs and npm Enterprise, have tens of thousands of happy users and the revenue from those sustains our core operations.” However, Business Insider reports that a recent NPM Inc funding round of the company raised only enough to continue operating until early 2020. https://twitter.com/coderbyheart/status/1152453087745007616 A big question on everyone’s mind currently is the stability of the public Node JS Registry. Most users in the JavaScript community do not have a fallback in place. While the community see Voss’s resignation with appreciation for his accomplishments, some are disappointed that he could not raise his voice against these odds and had to quit. "Nobody outside of the company, and not everyone within it, fully understands how much Laurie was the brains and the conscience of NPM," Jonathan Cowperthwait, former VP of marketing at NPM Inc, told The Register. CJ Silverio, a principal engineer at Eaze who served as NPM Inc's CTO said that it’s good that Voss is out but she wasn't sure whether his absence would matter much to the day-to-day operations of NPM Inc. Silverio was fired from NPM Inc late last year shortly after CEO Bryan Bogensberger’s arrival. “Bogensberger marginalized him almost immediately to get him out of the way, so the company itself probably won’t notice the departure," she said. "What should affect fundraising is the massive brain drain the company has experienced, with the entire CLI team now gone, and the registry team steadily departing. At some point they’ll have lost enough institutional knowledge quickly enough that even good new hires will struggle to figure out how to cope." Silverio also mentions that she had heard rumors of eliminating the public registry while only continuing with their paid enterprise service, which will be like killing their own competitive advantage. She says if the public registry disappears there are alternative projects like the one spearheaded by Silverio and a fellow developer Chris Dickinson, Entropic. Entropic is available under an open source Apache 2.0 license, Silverio says "You can depend on packages from any other Entropic instance, and your home instance will mirror all your dependencies for you so you remain self-sufficient." She added that the software will mirror any packages installed by a legacy package manager, which is to say npm. As a result, the more developers use Entropic, the less they'll need NPM Inc's platform to provide a list of available packages. Voss feels the scale of npm is 3x bigger than any other registry and boasts of an extremely fast growth rate i.e approx 8% month on month. "Creating a company to manage an open source commons creates some tensions and challenges is not a perfect solution, but it is better than any other solution I can think of, and none of the alternatives proposed have struck me as better or even close to equally good." he said. With  NPM Inc. sustainability at stake, the JavaScript community on Hacker News discussed alternatives in case the public registry comes to an end. One of the comments read, “If it's true that they want to kill the public registry, that means I may need to seriously investigate Entropic as an alternative. I almost feel like migrating away from the normal registry is an ethical issue now. What percentage of popular packages are available in Entropic? If someone else's repo is not in there, can I add it for them?” Another user responds, “The github registry may be another reasonable alternative... not to mention linking git hashes directly, but that has other issues.” Other than Entropic another alternative discussed is nixfromnpm, it is a tool in which you can translate NPM packages to Nix expression. nixfromnpm is developed by Allen Nelson and two other contributors from Chicago. Surprise NPM layoffs raise questions about the company culture Is the Npm 6.9.1 bug a symptom of the organization’s cultural problems? Npm Inc, after a third try, settles former employee claims, who were fired for being pro-union, The Register reports
Read more
  • 0
  • 0
  • 26117
Modal Close icon
Modal Close icon