Programming | 0 articles | Tech News, Tutorials & Expert Insights

article-image-is-golang-truly-community-driven-and-does-it-really-matter

24 May 2019

6 min read

Is Golang truly community driven and does it really matter?

24 May 2019

Golang, also called Go, is a statically typed, compiled programming language designed by Google. Golang is going from strength to strength, as more engineers than ever are using it at work, according to Go User Survey 2019. An opinion that has led to the Hacker News community into a heated debate last week: “Go is Google's language, not the community's”. The thread was first started by Chris Siebenmann who works at the Department of Computer Science, University of Toronto. His blog post reads, “Go has community contributions but it is not a community project. It is Google's project.” Chris explicitly states that the community's voice doesn't matter very much for Go's development, and we have to live with that. He argues that Google is the gatekeeper for community contributions; it alone decides what is and isn't accepted into Go. If a developer wants some significant feature to be accepted into Golang, working to build consensus in the community is far less important than persuading the Golang core team. He then cites the example of how one member of Google's Go core team discarded the entire Go Modules system that the Go community had been working on and brought in a relatively radically different model. Chris believes that the Golang team cares about the community and want them to be involved, but only up to a certain point. He wants the Go core team to be bluntly honest about the situation, rather than pretend and implicitly lead people on. He further adds, “Only if Go core team members start leaving Google and try to remain active in determining Go's direction, can we [be] certain Golang is a community-driven language.” He then compares Go with C++, calling the latter a genuine community-driven language. He says there are several major implementations in C++ which are genuine community projects, and the direction of C++ is set by an open standards committee with a relatively distributed membership. https://twitter.com/thatcks/status/1131319904039309312 What is better - community-driven or corporate ownership? There has been an opinion floating around developers about how some open source programming projects are just commercial projects driven mainly by a single company. If we look at the top open source projects, most of them have some kind of corporate backing ( Apple’s Swift, Oracle’s Java, MySQL, Microsoft’s Typescript, Google’s Kotlin, Golang, Android, MongoDB, Elasticsearch) to name a few. Which brings us to the question, what does corporate ownership of open source projects really mean? A benevolent dictatorship can have two outcomes. If the community for a particular project suggests a change, and in case a change is a bad idea, the corporate team can intervene and stop changes. On the other hand, though, it can actually stop good ideas from the community in being implemented, even if a handful of members from the core team disagree. Chris’s post has received a lot of attention by developers on Hacker News who both sided with and disagreed with the opinion put forward. A comment reads, “It's important to have a community and to work with it, but, especially for a programming language, there has to be a clear concept of which features should be implemented and which not - just accepting community contributions for the sake of making the community feel good would be the wrong way.” Another comment reads, “Many like Go because it is an opinionated language. I'm not sure that a 'community' run language will create something like that because there are too many opinions. Many claims to represent the community, but not the community that doesn't share their opinion. Without clear leaders, I fear technical direction and taste will be about politics which seems more uncertain/risky. I like that there is a tight cohesive group in control over Go and that they are largely the original designers. I might be more interested in alternative government structures and Google having too much control only if those original authors all stepped down.” Rather than splitting between Community or Corporate, a more accurate representation would be how much market value is depending on those projects. If a project is thriving, usually enterprises will take good decisions to handle it. However, another but entirely valid and important question to ask is ‘should open source projects be driven by their market value?’ Another common argument is that the core team’s full-time job is to take care of the language instead of taking errant decisions based on community backlash. Google (or Microsoft, or Apple, or Facebook for that matter) will not make or block a change in a way that kills an entire project. But this does not mean they should sit idly, ignoring the community response. Ideally, the more that a project genuinely belongs to its community, the more it will reflect what the community wants and needs. Google also has a propensity to kill its own products. What happens when Google is not as interested in Golang anymore? The company could leave it to the community to figure out the governance model suddenly by pulling off the original authors into some other exciting new project. Or they may let the authors only work on Golang in their spare time at home or at the weekends. While Google's history shows that many of their dead products are actually an important step towards something better and more successful, why and how much of that logic would be directly relevant to an open source project is something worth thinking about. As a Hacker news user wrote, “Go is developed by Bell Labs people, the same people who bought us C, Unix and Plan 9 (Ken, Pike, RSC, et al). They took the time to think through all their decisions, the impacts of said decisions, along with keeping things as simple as possible. Basically, doing things right the first time and not bolting on features simply because the community wants them.” Another says, “The way how Golang team handles potentially tectonic changes in language is also exemplary – very well communicated ideas, means to provide feedback and clear explanation of how the process works.” Rest assured, if any major change is made to Go, even a drastic one such as killing it, it will not be done without consulting the community and taking their feedback. Go User Survey 2018 results: Golang goes from strength to strength, as more engineers than ever are using it at work. GitHub releases Vulcanizer, a new Golang Library for operating Elasticsearch State of Go February 2019 – Golang developments report for this month released

0
0
29244

article-image-understanding-go-internals-defer-panic-and-recover-functions-tutorial

Packt Editorial Staff

09 Jul 2018

8 min read

Understanding Go Internals: defer, panic() and recover() functions [Tutorial]

Packt Editorial Staff

09 Jul 2018

8 min read

The Go programming language, often referred to as Golang, is making strides with masterclass developments and architecture by the greatest programming minds. The Go features are extremely handy, and you can use them all the time. However, there is nothing more rewarding than being able to see and understand what is going on in the background and how Go operates behind the scenes. In this article we will learn to use the defer keyword, panic() and recover() functions in Go. This article is extracted from the First Edition of Mastering Go written by Mihalis Tsoukalos. The concepts discussed in this article (and more) have been updated or improved in the third edition of Mastering Go. The defer keyword The defer keyword postpones the execution of a function until the surrounding function returns. It is widely used in file input and output operations because it saves you from having to remember when to close an opened file: the defer keyword allows you to put the function call that closes an opened file near to the function call that opened it. You will also see defer in action in the section that talks about the panic() and recover() built-in Go functions. It is very important to remember that deferred functions are executed in Last In First Out (LIFO) order after the return of the surrounding function. Put simply, this means that if you defer function f1() first, function f2() second, and function f3() third in the same surrounding function, when the surrounding function is about to return, function f3() will be executed first, function f2() will be executed second, and function f1() will be the last one to get executed. As this definition of defer is a little unclear, I think that you will understand the use of defer a little better by looking at the Go code and the output of the defer.go program, which will be presented in three parts. The first part of the program follows: package main import ( "fmt" ) func d1() { for i := 3; i > 0; i-- { defer fmt.Print(i, " ") } } Apart from the import block, the preceding Go code implements a function named d1() with a for loop and a defer statement that will be executed three times. The second part of defer.go contains the following Go code: func d2() { for i := 3; i > 0; i-- { defer func() { fmt.Print(i, " ") }() } fmt.Println() } In this part of the code, you can see the implementation of another function that is named d2(). The d2() function also contains a for loop and a defer statement that will be also executed three times. However, this time the defer keyword is applied to an anonymous function instead of a single fmt.Print() statement. Additionally, the anonymous function takes no parameters. The last part of the Go code follows: func d3() { for i := 3; i > 0; i-- { defer func(n int) { fmt.Print(n, " ") }(i) } } func main() { d1() d2() fmt.Println() d3() fmt.Println() } Apart from the main() function that calls the d1(), d2(), and d3() functions, you can also see the implementation of the d3() function, which has a for loop that uses the defer keyword on an anonymous function. However, this time the anonymous function requires one integer parameter named n. The Go code tells us that the n parameter takes its value from the i variable used in the for loop. Executing defer.go will create the following output: $ go run defer.go 1 2 3 0 0 0 1 2 3 You will most likely find the generated output complicated and challenging to understand. This underscores the fact that the operation and the results of the use of defer can be tricky if your code is not clear and unambiguous. Let's examine the results in order to get a better idea of how tricky defer can be if you do not pay close attention to your code. We will start with the first line of the output (1 2 3), which is generated by the d1() function. The values of i in d1() are 3, 2, and 1 in that order. The function that is deferred in d1() is the fmt.Print() statement. As a result, when the d1() function is about to return, you get the three values of the i variable of the for loop in reverse order because deferred functions are executed in LIFO order. Now, let us inspect the second line of the output that is produced by the d2() function. It is really strange that we got three zeros instead of 1 2 3 in the output. The reason for this, however, is relatively simple. After the for loop has ended, the value of i is 0, because it is that value of i that made the for loop terminate. However, the tricky part here is that the deferred anonymous function is evaluated after the for loop ends, because it has no parameters. This means that is evaluated three times for an i value of 0, hence the generated output! This kind of confusing code is what might lead to the creation of nasty bugs in your projects, so try to avoid it! Last, we will talk about the third line of the output, which is generated by the d3() function. Due to the parameter of the anonymous function, each time the anonymous function is deferred, it gets and uses the current value of i. As a result, each execution of the anonymous function has a different value to process, thus the generated output. After this, it should be clear that the best approach to the use of defer is the third one, which is exhibited in the d3() function. This is so because you intentionally pass the desired variable in the anonymous function in an easy to understand way. Panic and Recover This technique involves the use of the panic() and recover() functions, and it will be presented in panicRecover.go, which you will review in three parts. Strictly speaking, panic() is a built-in Go function that terminates the current flow of a Go program and starts panicking! On the other hand, the recover() function, which is also a built-in Go function, allows you to take back the control of a goroutine that just panicked using panic(). The first part of the program follows: package main import ( "fmt" ) func a() { fmt.Println("Inside a()") defer func() { if c := recover(); c != nil { fmt.Println("Recover inside a()!") } }() fmt.Println("About to call b()") b() fmt.Println("b() exited!") fmt.Println("Exiting a()") } Apart from the import block, this part includes the implementation of the a() function. The most important part of the a() function is the defer block of code, which implements an anonymous function that will be called when there is a call to panic(). The second code segment of panicRecover.go follows next: func b() { fmt.Println("Inside b()") panic("Panic in b()!") fmt.Println("Exiting b()") } The last part of the program, which illustrates the panic() and recover() functions, is as follows: func main() { a() fmt.Println("main() ended!") } Executing panicRecover.go will create the following output: $ go run panicRecover.go Inside a() About to call b() Inside b() Recover inside a()! main() ended! What just happened here is really impressive! However, as you can see from the output, the a() function did not end normally, because its last two statements did not get executed: fmt.Println("b() exited!") fmt.Println("Exiting a()") Nevertheless, the good thing is that panicRecover.go ended according to our will without panicking because the anonymous function used in defer took control of the situation! Also note that function b() knows nothing about function a(). However, function a() contains Go code that handles the panic condition of function b()! Using the panic function on its own You can also use the panic() function on its own without any attempt to recover, and this subsection will show you its results using the Go code of justPanic.go, which will be presented in two parts. The first part of justPanic.go follows next: package main import ( "fmt" "os" ) As you can see, the use of panic() does not require any extra Go packages. The second part of justPanic.go is shown in the following Go code: func main() { if len(os.Args) == 1 { panic("Not enough arguments!") } fmt.Println("Thanks for the argument(s)!") } If your Go program does not have at least one command line argument, it will call the panic() function. The panic() function takes one parameter, which is the error message that you want to print on the screen. Executing justPanic.go on a macOS High Sierra machine will create the following output: $ go run justPanic.go panic: Not enough arguments! goroutine 1 [running]: main.main() /Users/mtsouk/ch2/code/justPanic.go:10 +0x9e exit status 2 Thus, using the panic() function on its own will terminate the Go program without giving you the opportunity to recover! Therefore use of the panic() and recover() pair is much more practical and professional than just using panic() alone. To summarize, we covered some of the interesting Go topics like; defer keyword; the panic() and recover() functions. To explore other major features and packages in Go, get our latest edition in Go programming, Mastering Go, written by Mihalis Tsoukalos. Implementing memory management with Golang’s garbage collector Why is Go the go-to language for cloud native development? – An interview with Mina Andrawos How to build a basic server side chatbot using Go How Concurrency and Parallelism works in Golang [Tutorial]

0
0
29173

article-image-microsoft-mulls-replacing-c-and-c-code-with-rust-calling-it-a-a-modern-safer-system-programming-language-with-great-memory-safety-features

Vincy Davis

18 Jul 2019

3 min read

Microsoft mulls replacing C and C++ code with Rust calling it a "modern safer system programming language" with great memory safety features

Vincy Davis

18 Jul 2019

3 min read

Here's another reason why Rust is the present and the future in programming. Few days ago, Microsoft announced that they are going to start exploring Rust and skip their own C languages. This announcement was made by the Principal Security Engineering Manager of Microsoft Security Response Centre (MSRC), Gavin Thomas. Thomas states that ~70% of the vulnerabilities which Microsoft assigns a CVE each year are caused by developers, who accidently insert memory corruption bugs into their C and C++ code. He adds, "As Microsoft increases its code base and uses more Open Source Software in its code, this problem isn’t getting better, it's getting worse. And Microsoft isn’t the only one exposed to memory corruption bugs—those are just the ones that come to MSRC." Image Source: Microsoft blog He highlights the fact that even after having so many security mechanisms (like static analysis tools, fuzzing at scale, taint analysis, many encyclopaedias of coding guidelines, threat modelling guidance, etc) to make a code secure, developers have to invest a lot of time in studying about more tools for training and vulnerability fixes. Thomas states that though C++ has many qualities like fast, mature, small memory and disk footprint, it does not have the memory security guarantee of languages like .NET C#. He believes that Rust is one language, which can provide both the requirements. Thomas strongly advocates that a software security industry should focus on providing a secure environment for developers to work on, rather than turning deaf ear to the importance of security, outdated methods and approaches. He thus concludes by hinting that Microsoft is going to adapt the Rust programming language. As he says that, "Perhaps it's time to scrap unsafe legacy languages and move on to a modern safer system programming language?" Microsoft exploring Rust is not surprising as Rust has been popular with many developers for its simpler syntax, less bugs, memory safe and thread safety. It has also been voted as the most loved programming language, according to the 2019 StackOverflow survey, the biggest developer survey on the internet. It allows developers to focus on their applications, rather than worrying about its security and maintenance. Recently, there have been many applications written in Rust, like Vector, Brave ad-blocker, PyOxidizer and more. Developers couldn't agree more with this post, as all have expressed their love for Rust. https://twitter.com/alilleybrinker/status/1151495738158977024 https://twitter.com/karanganesan/status/1151485485644054528 https://twitter.com/shah_sheikh/status/1151457054004875264 A Redditor says, "While this first post is very positive about memory-safe system programming languages in general and Rust in particular, I would not call this an endorsement. Still, great news!" Visit the Microsoft blog for more details. Introducing Ballista, a distributed compute platform based on Kubernetes and Rust EU Commission opens an antitrust case against Amazon on grounds of violating EU competition rules Fastly CTO Tyler McMullen on Lucet and the future of WebAssembly and Rust [Interview]

0
0
29036

article-image-how-to-build-12-factor-design-microservices-on-docker-part-2

Cody A.

29 Jun 2015

14 min read

How to Build 12 Factor Microservices on Docker - Part 2

Cody A.

29 Jun 2015

14 min read

Welcome back to our how-to on Building and Running 12 Factor Microservices on Docker. In Part 1, we introduced a very simple python flask application which displayed a list of users from a relational database. Then we walked through the first four of these factors, reworking the example application to follow these guidelines. In Part 2, we'll be introducing a multi-container Docker setup as the execution environment for our application. We’ll continue from where we left off with the next factor, number five. Build, Release, Run. A 12-factor app strictly separates the process for transforming a codebase into a deploy into distinct build, release, and run stages. The build stage creates an executable bundle from a code repo, including vendoring dependencies and compiling binaries and asset packages. The release stage combines the executable bundle created in the build with the deploy’s current config. Releases are immutable and form an append-only ledger; consequently, each release must have a unique release ID. The run stage runs the app in the execution environment by launching the app’s processes against the release. This is where your operations meet your development and where a PaaS can really shine. For now, we’re assuming that we’ll be using a Docker-based containerized deploy strategy. We’ll start by writing a simple Dockerfile. The Dockerfile starts with an ubuntu base image and then I add myself as the maintainer of this app. FROM ubuntu:14.04.2 MAINTAINER codyaray Before installing anything, let’s make sure that apt has the latest versions of all the packages. RUN echo "deb http://archive.ubuntu.com/ubuntu/ $(lsb_release -sc) main universe" >> /etc/apt/sources.list RUN apt-get update Install some basic tools and the requirements for running a python webapp RUN apt-get install -y tar curl wget dialog net-tools build-essential RUN apt-get install -y python python-dev python-distribute python-pip RUN apt-get install -y libmysqlclient-dev Copy over the application to the container. ADD /. /src Install the dependencies. RUN pip install -r /src/requirements.txt Finally, set the current working directory, expose the port, and set the default command. EXPOSE 5000 WORKDIR /src CMD python app.py Now, the build phase consists of building a docker image. You can build and store locally with docker build -t codyaray/12factor:0.1.0 . If you look at your local repository, you should see the new image present. $ docker images REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE codyaray/12factor 0.1.0 bfb61d2bbb17 1 hour ago 454.8 MB The release phase really depends on details of the execution environment. You’ll notice that none of the configuration is stored in the image produced from the build stage; however, we need a way to build a versioned release with the full configuration as well. Ideally, the execution environment would be responsible for creating releases from the source code and configuration specific to that environment. However, if we’re working from first principles with Docker rather than a full-featured PaaS, one possibility is to build a new docker image using the one we just built as a base. Each environment would have its own set of configuration parameters and thus its own Dockerfile. It could be something as simple as FROM codyaray/12factor:0.1.0 MAINTAINER codyaray ENV DATABASE_URL mysql://sa:mypwd@mydbinstance.abcdefghijkl.us-west-2.rds.amazonaws.com/mydb This is simple enough to be programmatically generated given the environment-specific configuration and the new container version to be deployed. For the demonstration purposes, though, we’ll call the above file Dockerfile-release so it doesn’t conflict with the main application’s Dockerfile. Then we can build it with docker build -f Dockerfile-release -t codyaray/12factor-release:0.1.0.0 . The resulting built image could be stored in the environment’s registry as codyaray/12factor-release:0.1.0.0. The images in this registry would serve as the immutable ledger of releases. Notice that the version has been extended to include a fourth level which, in this instance, could represent configuration version “0” applied to source version “0.1.0”. The key here is that these configuration parameters aren’t collated into named groups (sometimes called “environments”). For example, these aren’t static files named like Dockerfile.staging or Dockerfile.dev in a centralized repo. Rather, the set of parameters is distributed so that each environment maintains its own environment mapping in some fashion. The deployment system would be setup such that a new release to the environment automatically applies the environment variables it has stored to create a new Docker image. As always, the final deploy stage depends on whether you’re using a cluster manager, scheduler, etc. If you’re using standalone Docker, then it would boil down to docker run -P -t codyaray/12factor-release:0.1.0.0 Processes. A 12-factor app is executed as one or more stateless processes which share nothing and are horizontally partitionable. All data which needs to be stored must use a stateful backing service, usually a database. This means no sticky sessions and no in-memory or local disk-based caches. These processes should never daemonize or write their own PID files; rather, they should rely on the execution environment’s process manager (such as Upstart). This factor must be considered up-front, in line with the discussions on antifragility, horizontal scaling, and overall application design. As the example app delegates all stateful persistence to a database, we’ve already succeeded on this point. However, it is good to note that a number of issues have been found using the standard ubuntu base image for Docker, one of which is its process management (or lack thereof). If you would like to use a process manager to automatically restart crashed daemons, or to notify a service registry or operations team, check out baseimage-docker. This image adds runit for process supervision and management, amongst other improvements to base ubuntu for use in Docker such as obsoleting the need for pid files. To use this new image, we have to update the Dockerfile to set the new base image and use its init system instead of running our application as the root process in the container. FROM phusion/baseimage:0.9.16 MAINTAINER codyaray RUN echo "deb http://archive.ubuntu.com/ubuntu/ $(lsb_release -sc) main universe" >> /etc/apt/sources.list RUN apt-get update RUN apt-get install -y tar git curl nano wget dialog net-tools build-essential RUN apt-get install -y python python-dev python-distribute python-pip RUN apt-get install -y libmysqlclient-dev ADD /. /src RUN pip install -r /src/requirements.txt EXPOSE 5000 WORKDIR /src RUN mkdir /etc/service/12factor ADD 12factor.sh /etc/service/12factor/run # Use baseimage-docker's init system. CMD ["/sbin/my_init"] Notice the file 12factor.sh that we’re now adding to /etc/service. This is how we instruct runit to run our application as a service. Let’s add the new 12factor.sh file. #!/bin/sh python /src/app.py Now the new containers we deploy will attempt to be a little more fault-tolerant by using an OS-level process manager. Port Binding. A 12-factor app must be self-contained and bind to a port specified as an environment variable. It can’t rely on the injection of a web container such as tomcat or unicorn; instead it must embed a server such as jetty or thin. The execution environment is responsible for routing requests from a public-facing hostname to the port-bound web process. This is trivial with most embedded web servers. If you’re currently using an external web server, this may require more effort to support an embedded server within your application. For the example python app (which uses the built-in flask web server), it boils down to port = int(os.environ.get("PORT", 5000)) app.run(host='0.0.0.0', port=port) Now the execution environment is free to instruct the application to listen on whatever port is available. This obviates the need for the application to tell the environment what ports must be exposed, as we’ve been required to do with Docker. Concurrency. Because a 12-factor exclusively uses stateless processes, it can scale out by adding processes. A 12-factor app can have multiple process types, such as web processes, background worker processes, or clock processes (for cron-like scheduled jobs). As each process type is scaled independently, each logical process would become its own Docker container as well. We’ve already seen building a web process; other processes are very similar. In most cases, scaling out simply means launching more instances of the container. (Its usually not desirable to scale out the clock processes, though, as they often generate events that you want to be scheduled singletons within your infrastructure.) Disposability. A 12-factor app’s processes can be started or stopped (with a SIGTERM) anytime. Thus, minimizing startup time and gracefully shutting down is very important. For example, when a web service receives a SIGTERM, it should stop listening on the HTTP port, allow in-flight requests to finish, and then exit. Similar, processes should be robust against sudden death; for example, worker processes should use a robust queuing backend. You want to ensure the web server you select can gracefully shutdown. The is one of the trickier parts of selecting a web server, at least for many of the common python http servers that I’ve tried. In theory, shutting down based on receiving a SIGTERM should be as simple as follows. import signal signal.signal(signal.SIGTERM, lambda *args: server.stop(timeout=60)) But often times, you’ll find that this will immediately kill the in-flight requests as well as closing the listening socket. You’ll want to test this thoroughly if dependable graceful shutdown is critical to your application. Dev/Prod Parity. A 12-factor app is designed to keep the gap between development and production small. Continuous deployment shrinks the amount of time that code lives in development but not production. A self-serve platform allows developers to deploy their own code in production, just like they do in their local development environments. Using the same backing services (databases, caches, queues, etc) in development as production reduces the number of subtle bugs that arise in inconsistencies between technologies or integrations. As we’re deploying this solution using fully Dockerized containers and third-party backing services, we’ve effectively achieved dev/prod parity. For local development, I use boot2docker on my Mac which provides a Docker-compatible VM to host my containers. Using boot2docker, you can start the VM and setup all the env variables automatically with boot2docker up $(boot2docker shellinit) Once you’ve initialized this VM and set the DOCKER_HOST variable to its IP address with shellinit, the docker commands given above work exactly the same for development as they do for production. Logs. Consider logs as a stream of time-ordered events collected from all running processes and backing services. A 12-factor app doesn’t concern itself with how its output is handled. Instead, it just writes its output to its `stdout` stream. The execution environment is responsible for collecting, collating, and routing this output to its final destination(s). Most logging frameworks either support logging to stderr/stdout by default or easily switching from file-based logging to one of these streams. In a 12-factor app, the execution environment is expected to capture these streams and handle them however the platform dictates. Because our app doesn’t have specific logging yet, and the only logs are from flask and already to stderr, we don’t have any application changes to make. However, we can show how an execution environment which could be used handle the logs. We’ll setup a Docker container which collects the logs from all the other docker containers on the same host. Ideally, this would then forward the logs to a centralized service such as Elasticsearch. Here we’ll demo using Fluentd to capture and collect the logs inside the log collection container; a simple configuration change would allow us to switch from writing these logs to disk as we demo here and instead send them from Fluentd to a local Elasticsearch cluster. We’ll create a Dockerfile for our new logcollector container type. For more detail, you can find a Docker fluent tutorial here. We can call this file Dockerfile-logcollector. FROM kiyoto/fluentd:0.10.56-2.1.1 MAINTAINER kiyoto@treasure-data.com RUN mkdir /etc/fluent ADD fluent.conf /etc/fluent/ CMD "/usr/local/bin/fluentd -c /etc/fluent/fluent.conf" We use an existing fluentd base image with a specific fluentd configuration. Notably this tails all the log files in /var/lib/docker/containers/<container-id>/<container-id>-json.log, adds the container ID to the log message, and then writes to JSON-formatted files inside /var/log/docker. <source> type tail path /var/lib/docker/containers/*/*-json.log pos_file /var/log/fluentd-docker.pos time_format %Y-%m-%dT%H:%M:%S tag docker.* format json </source> <match docker.var.lib.docker.containers.*.*.log> type record_reformer container_id ${tag_parts[5]} tag docker.all </match> <match docker.all> type file path /var/log/docker/*.log format json include_time_key true </match> As usual, we create a Docker image. Don’t forget to specify the logcollector Dockerfile. docker build -f Dockerfile-logcollector -t codyaray/docker-fluentd . We’ll need to mount two directories from the Docker host into this container when we launch it. Specifically, we’ll mount the directory containing the logs from all the other containers as well as the directory to which we’ll be writing the consolidated JSON logs. docker run -d -v /var/lib/docker/containers:/var/lib/docker/containers -v /var/log/docker:/var/log/docker codyaray/docker-fluentd Now if you check in the /var/log/docker directory, you’ll see the collated JSON log files. Note that this is on the docker host rather than in any container; if you’re using boot2docker, you can ssh into the docker host with boot2docker ssh and then check /var/log/docker. Admin Processes. Any admin or management tasks for a 12-factor app should be run as one-off processes within a deploy’s execution environment. This process runs against a release using the same codebase and configs as any process in that release and uses the same dependency isolation techniques as the long-running processes. This is really a feature of your app's execution environment. If you’re running a Docker-like containerized solution, this may be pretty trivial. docker run -i -t --entrypoint /bin/bash codyaray/12factor-release:0.1.0.0 The -i flag instructs docker to provide interactive session, that is, to keep the input and output ttys attached. Then we instruct docker to run the /bin/bash command instead of another 12factor app instance. This creates a new container based on the same docker image, which means we have access to all the code and configs for this release. This will drop us into a bash terminal to do whatever we want. But let’s say we want to add a new “friends” table to our database, so we wrote a migration script add_friends_table.py. We could run it as follows: docker run -i -t --entrypoint python codyaray/12factor-release:0.1.0.0 /src/add_friends_table.py As you can see, following the few simple rules specified in the 12 Factor manifesto really allows your execution environment to manage and scale your application. While this may not be the most feature-rich integration within a PaaS, it is certainly very portable with a clean separation of responsibilities between your app and its environment. Much of the tools and integration demonstrated here were a do-it-yourself container approach to the environment, which would be subsumed by an external vertically integrated PaaS such as Deis. If you’re not familiar with Deis, its one of several competitors in the open source platform-as-a-service space which allows you to run your own PaaS on a public or private cloud. Like many, Deis is inspired by Heroku. So instead of Dockerfiles, Deis uses a buildpack to transform a code repository into an executable image and a Procfile to specify an app’s processes. Finally, by default you can use a specialized git receiver to complete a deploy. Instead of having to manage separate build, release, and deploy stages yourself like we described above, deploying an app to Deis could be a simple as git push deis-prod While it can’t get much easier than this, you’re certainly trading control for simplicity. It's up to you to determine which works best for your business. Find more Docker tutorials alongside our latest releases on our dedicated Docker page. About the Author Cody A. Ray is an inquisitive, tech-savvy, entrepreneurially-spirited dude. Currently, he is a software engineer at Signal, an amazing startup in downtown Chicago, where he gets to work with a dream team that’s changing the service model underlying the Internet.

0
1
29008

article-image-how-to-implement-immutability-functions-in-kotlin

Aaron Lazar

27 Jun 2018

8 min read

How to implement immutability functions in Kotlin [Tutorial]

Aaron Lazar

27 Jun 2018

8 min read

Unlike Clojure, Haskell, F#, and the likes, Kotlin is not a pure functional programming language, where immutability is forced; rather, we may refer to Kotlin as a perfect blend of functional programming and OOP languages. It contains the major benefits of both worlds. So, instead of forcing immutability like pure functional programming languages, Kotlin encourages immutability, giving it automatic preference wherever possible. In this article, we'll understand the various methods of implementing immutability in Kotlin. This article has been taken from the book, Functional Kotlin, by Mario Arias and Rivu Chakraborty. In other words, Kotlin has immutable variables (val), but no language mechanisms that would guarantee true deep immutability of the state. If a val variable references a mutable object, its contents can still be modified. We will have a more elaborate discussion and a deeper dive on this topic, but first let us have a look at how we can get referential immutability in Kotlin and the differences between var, val, and const val. By true deep immutability of the state, we mean a property will always return the same value whenever it is called and that the property never changes its value; we can easily avoid this if we have a val property that has a custom getter. You can find more details at the following link: https://artemzin.com/blog/kotlin-val-does-not-mean-immutable-it-just-means-readonly-yeah/ The difference between var and val So, in order to encourage immutability but still let the developers have the choice, Kotlin introduced two types of variables. The first one is var, which is just a simple variable, just like in any imperative language. On the other hand, val brings us a bit closer to immutability; again, it doesn't guarantee immutability. So, what exactly does the val variable provide us? It enforces read-only, you cannot write into a val variable after initialization. So, if you use a val variable without a custom getter, you can achieve referential immutability. Let's have a look; the following program will not compile: fun main(args: Array<String>) { val x:String = "Kotlin" x+="Immutable"//(1) } As I mentioned earlier, the preceding program will not compile; it will give an error on comment (1). As we've declared variable x as val, x will be read-only and once we initialize x; we cannot modify it afterward. So, now you're probably asking why we cannot guarantee immutability with val ? Let's inspect this with the following example: object MutableVal { var count = 0 val myString:String = "Mutable" get() {//(1) return "$field ${++count}"//(2) } } fun main(args: Array<String>) { println("Calling 1st time ${MutableVal.myString}") println("Calling 2nd time ${MutableVal.myString}") println("Calling 3rd time ${MutableVal.myString}")//(3) } In this program, we declared myString as a val property, but implemented a custom get function, where we tweaked the value of myString before returning it. Have a look at the output first, then we will further look into the program: As you can see, the myString property, despite being val, returned different values every time we accessed it. So, now, let us look into the code to understand such behavior. On comment (1), we declared a custom getter for the val property myString. On comment (2), we pre-incremented the value of count and added it after the value of the field value, myString, and returned the same from the getter. So, whenever we requested the myString property, count got incremented and, on the next request, we got a different value. As a result, we broke the immutable behavior of a val property. Compile time constants So, how can we overcome this? How can we enforce immutability? The const val properties are here to help us. Just modify val myString with const val myString and you cannot implement the custom getter. While val properties are read-only variables, const val, on the other hand, are compile time constants. You cannot assign the outcome (result) of a function to const val. Let's discuss some of the differences between val and const val: The val properties are read-only variables, while const val are compile time constants The val properties can have custom getters, but const val cannot We can have val properties anywhere in our Kotlin code, inside functions, as a class member, anywhere, but const val has to be a top-level member of a class/object You cannot write delegates for the const val properties We can have the val property of any type, be it our custom class or any primitive data type, but only primitive data types and String are allowed with a const val property We cannot have nullable data types with the const val properties; as a result, we cannot have null values for the const val properties either As a result, the const val properties guarantee immutability of value but have lesser flexibility and you are bound to use only primitive data types with const val, which cannot always serve our purposes. Now, that I've used the word referential immutability quite a few times, let us now inspect what it means and how many types of immutability there are. Types of immutability There are basically the following two types of immutability: Referential immutability Immutable values Immutable reference (referential immutability) Referential immutability enforces that, once a reference is assigned, it can't be assigned to something else. Think of having it as a val property of a custom class, or even MutableList or MutableMap; after you initialize the property, you cannot reference something else from that property, except the underlying value from the object. For example, take the following program: class MutableObj { var value = "" override fun toString(): String { return "MutableObj(value='$value')" } } fun main(args: Array<String>) { val mutableObj:MutableObj = MutableObj()//(1) println("MutableObj $mutableObj") mutableObj.value = "Changed"//(2) println("MutableObj $mutableObj") val list = mutableListOf("a","b","c","d","e")//(3) println(list) list.add("f")//(4) println(list) } Have a look at the output before we proceed with explaining the program: So, in this program we've two val properties—list and mutableObj. We initialized mutableObj with the default constructor of MutableObj, since it's a val property it'll always refer to that specific object; but, if you concentrate on comment (2), we changed the value property of mutableObj, as the value property of the MutableObj class is mutable (var). It's the same with the list property, we can add items to the list after initialization, changing its underlying value. Both list and mutableObj are perfect examples of immutable reference; once initialized, the properties can't be assigned to something else, but their underlying values can be changed (you can refer the output). The reason behind that is the data type we used to assign to those properties. Both the MutableObj class and the MutableList<String> data structures are mutable themselves, so we cannot restrict value changes for their instances. Immutable values The immutable values, on the other hand, enforce no change on values as well; it is really complex to maintain. In Kotlin, the const val properties enforce immutability of value, but they lack flexibility (we already discussed them) and you're bound to use only primitive types, which can be troublesome in real-life scenarios. Immutable collections Kotlin gives preference to immutability wherever possible, but leaves the choice to the developer whether or when to use it. This power of choice makes the language even more powerful. Unlike most languages, where they have either only mutable (like Java, C#, and so on) or only immutable collections (like F#, Haskell, Clojure, and so on), Kotlin has both and distinguishes between them, leaving the developer with the freedom to choose whether to use an immutable or mutable one. Kotlin has two interfaces for collection objects—Collection<out E> and MutableCollection<out E>; all the collection classes (for example, List, Set, or Map) implement either of them. As the name suggests, the two interfaces are designed to serve immutable and mutable collections respectively. Let us have an example: fun main(args: Array<String>) { val immutableList = listOf(1,2,3,4,5,6,7)//(1) println("Immutable List $immutableList") val mutableList:MutableList<Int> = immutableList.toMutableList()//(2) println("Mutable List $mutableList") mutableList.add(8)//(3) println("Mutable List after add $mutableList") println("Mutable List after add $immutableList") } The output is as follows: So, in this program, we created an immutable list with the help of the listOf method of Kotlin, on comment (1). The listOf method creates an immutable list with the elements (varargs) passed to it. This method also has a generic type parameter, which can be skipped if the elements array is not empty. The listOf method also has a mutable version—mutableListOf() which is identical except that it returns MutableList instead. We can convert an immutable list to a mutable one with the help of the toMutableList() extension function, we did the same in comment (2), to add an element to it on comment (3). However, if you check the output, the original Immutable List remains the same without any changes, the item is, however, added to the newly created MutableList instead. So now you know how to implement immutability in Kotlin. If you found this tutorial helpful, and would like to learn more, head on over to purchase the full book, Functional Kotlin, by Mario Arias and Rivu Chakraborty. Extension functions in Kotlin: everything you need to know Building RESTful web services with Kotlin Building chat application with Kotlin using Node.js, the powerful Server-side JavaScript platform

0
0
28976

article-image-processing-next-generation-sequencing-datasets-using-python

Packt

07 Jul 2015

25 min read

Processing Next-generation Sequencing Datasets Using Python

Packt

07 Jul 2015

25 min read

0
0
28749

article-image-writing-your-first-cucumber-appium-test

Packt

27 Jun 2017

12 min read

Writing Your First Cucumber Appium Test

Packt

27 Jun 2017

12 min read

0
0
28397

article-image-9-reasons-why-rust-programmers-love-rust

Richa Tripathi

03 Oct 2018

8 min read

9 reasons why Rust programmers love Rust

Richa Tripathi

03 Oct 2018

8 min read

The 2018 survey of the RedMonk Programming Language Rankings marked the entry of a new programming language in their Top 25 list. It has been an incredibly successful year for the Rust programming language in terms of its popularity. It also jumped from the 46th most popular language on GitHub to the 18th position. The Stack overflow survey of 2018 is another indicator of the rise of Rust programming language. Almost 78% of the developers who are working with Rust loved working on it. It topped the list of the most loved programming language among the developers who took the survey for a straight third year in the row. Not only that but it ranked 8th in the most wanted programming language in the survey, which means that the respondent of the survey who has not used it yet but would like to learn. Although, Rust was designed as a low-level language, best suited for systems, embedded, and other performance critical code, it is gaining a lot of traction and presents a great opportunity for web developers and game developers. RUST is also empowering novice developers with the tools to start shipping code fast. So, why is Rust so tempting? Let's explore the high points of this incredible language and understand the variety of features that make it interesting to learn. Automatic Garbage Collection Garbage collection and non-memory resources often create problems with some systems languages. But Rust pays no head to garbage collection and removes the possibilities of failures caused by them. In Rust, garbage collection is completely taken care of by RAII (Resource Acquisition Is Initialization). Better support for Concurrency Concurrency and parallelism are incredibly imperative topics in computer science and are also a hot topic in the industry today. Computers are gaining more and more cores, yet many programmers aren't prepared to fully utilize the power of them. Handling concurrent programming safely and efficiently is another major goal of Rust language. Concurrency is difficult to reason about. In Rust, there is a strong, static type system that helps to reason about your code. As such, Rust also gives you two traits Send and Sync to help you make sense of code that can possibly be concurrent. Rust's standard library also provides a library for threads, which enable you to run Rust code in parallel. You can also use Rust’s threads as a simple isolation mechanism. Error Handling in Rust is beautiful A programmer is bound to make errors, irrespective of the programming language they use. Making errors while programming is normal, but it's the error handling mechanism of that programming language, which enhances the experience of writing the code. In Rust, errors are divided into types: unrecoverable errors and recoverable errors. Unrecoverable errors An error is classified as 'unrecoverable' when there is no other option other than to abort the program. The panic! macro in Rust is very helpful in these cases, especially when a bug has been detected in the code but the programmer is not clear how to handle that error. The panic! macro generates a failure message that helps the user to debug a problem. It also helps to stop the execution before more catastrophic events occur. Recoverable errors The errors which can be handled easily or which do not have a serious impact on the execution of the program are known as recoverable errors. It is represented by the Result<T, E>. The Result<T, E> is an enum that consists of two variants, i.e., OK<T> and Err<E>. It describes the possible error in the program. OK<T>: The 'T' is a type of value which returns the OK variant in the success case. It is an expected outcome. Err<E>: The 'E' is a type of error which returns the ERR variant in the failure. It is an unexpected outcome. Resource Management The one attribute that makes Rust stand out (and completely overpowers Google’s Go for that matter), is the algorithm used for resource management. Rust follows the C++ lead, with concepts like borrowing and mutable borrowing on the plate and thus resource management becomes an elegant process. Furthermore, Rust didn’t need a second chance to know that resource management is not just about memory usage; the fact that they did it right first time makes them a standout performer on this point. Although the Rust documentation does a good job of explaining the technical details, the article by Tim explains the concept in a much friendlier and easy to understand language. As such I thought, it would be good to list his points as well here. The following excerpt is taken from the article written by M.Tim Jones. Reusable code via modules Rust allows you to organize code in a way that promotes its reuse. You attain this reusability by using modules which are nothing but organized code as packages that other programmers can use. These modules contain functions, structures and even other modules that you can either make public, which can be accessed by the users of the module or you can make it private which can be used only within the module and not by the module user. There are three keywords to create modules, use modules, and modify the visibility of elements in modules. The mod keyword creates a new module The use keyword allows you to use the module (expose the definitions into the scope to use them) The pub keyword makes elements of the module public (otherwise, they're private). Cleaner code with better safety checks In Rust, the compiler enforces memory safety and another checking that make the programming language safe. Here, you will never have to worry about dangling pointers or bother using an object after it has been freed. These things are part of the core Rust language that allows you to write clean code. Also, Rust includes an unsafe keyword with which you can disable checks that would typically result in a compilation error. Data types and Collections in Rust Rust is a statically typed programming language, which means that every value in Rust must have a specified data type. The biggest advantage of static typing is that a large class of errors is identified earlier in the development process. These data types can be broadly classified into two types: scalar and compound. Scalar data types represent a single value like integer, floating-point, and character, which are commonly present in other programming languages as well. But Rust also provides compound data types which allow the programmers to group multiple values in one type such as tuples and arrays. The Rust standard library provides a number of data structures which are also called collections. Collections contain multiple values but they are different from the standard compound data types like tuples and arrays which we discussed above. The biggest advantage of using collections is the capability of not specifying the amount of data at compile time which allows the structure to grow and shrink as the program runs. Vectors, Strings, and hash maps are the three most commonly used collections in Rust. The friendly Rust community Rust owes it success to the breadth and depth of engagement of its vibrant community, which supports a highly collaborative process for helping the language to evolve in a truly open-source way. Rust is built from the bottom up, rather than any one individual or organization controlling the fate of the technology. Reliable Robust Release cycles of Rust What is common between Java, Spring, and Angular? They never release their update when they promise to. The release cycle of the Rust community works with clockwork precision and is very reliable. Here’s an overview of the dates and versions: In mid-September 2018, the Rust team released Rust 2018 RC1 version. Rust 2018 is the first major new edition of Rust (after Rust 1.0 released in 2015). This new release would mark the culmination of the last three years of Rust’s development from the core team, and brings the language together in one neat package. This version includes plenty of new features like raw identifiers, better path clarity, new optimizations, and other additions. You can learn more about the Rust language and its evolution at the Rust blog and download from the Rust language website. Note: the headline was edited 09.06.2018 to make it clear that Rust was found to be the most loved language among developers using it. Rust 2018 RC1 now released with Raw identifiers, better path clarity, and other changes Rust as a Game Programming Language: Is it any good? Rust Language Server, RLS 1.0 releases with code intelligence, syntax highlighting and more

0
3
28344

article-image-are-you-looking-at-transitioning-from-being-a-developer-to-manager-here-are-some-leadership-roles-to-consider

Packt Editorial Staff

04 Jul 2019

6 min read

Are you looking at transitioning from being a developer to manager? Here are some leadership roles to consider

Packt Editorial Staff

04 Jul 2019

6 min read

What does the phrase "a manager" really mean anyway? This phrase means different things to different people and is often overused for the position which nearly matches an analyst-level profile! This term, although common, is worth defining what it really means, especially in the context of software development. This article is an excerpt from the book The Successful Software Manager written by an internationally experienced IT manager, Herman Fung. This book is a comprehensive and practical guide to managing software developers, software customers, and explores the process of deciding what software needs to be built, not how to build it. In this article, we’ll look into aspects you must be aware of before making the move to become a manager in the software industry. A simple distinction I once used to illustrate the difference between an analyst and a manager is that while an analyst identifies, collects, and analyzes information, a manager uses this analysis and makes decisions, or more accurately, is responsible and accountable for the decisions they make. The structure of software companies is now enormously diverse and varies a lot from one to another, which has an obvious impact on how the manager’s role and their responsibilities are defined, which will be unique to each company. Even within the same company, it's subject to change from time to time, as the company itself changes. Broadly speaking, a manager within software development can be classified into three categories, as we will now discuss: Team Leader/Manager This role is often a lead developer who also doubles up as the team spokesperson and single point of contact. They'll typically be the most senior and knowledgeable member of a small group of developers, who work on the same project, product, and technology. There is often a direct link between each developer in the team and their code, which means the team manager has a direct responsibility to ensure the product as a whole works. Usually, the team manager is also asked to fulfill the people management duties, such as performance reviews and appraisals, and day-to-day HR responsibilities. Development/Delivery Manager This person could be either a techie or a non-techie. They will have a good understanding of the requirements, design, code, and end product. They will manage running workshops and huddles to facilitate better overall team working and delivery. This role may include setting up visual aids, such as team/project charts or boards. In a matrix management model, where developers and other experts are temporarily asked to work in project teams, the development manager will not be responsible for HR and people management duties. Project Manager This person is most probably a non-techie, but there are exceptions, and this could be a distinct advantage on certain projects. Most importantly, a project manager will be process-focused and output-driven and will focus on distributing tasks to individuals. They are not expected to jump in to solve technical problems, but they are responsible for ensuring that the proper resources are available, while managing expectations. Specifically, they take part in managing the project budget, timeline, and risks. They should also be aware of the political landscape and management agenda within the organization to be able to navigate through them. The project manager ensures the project follows the required methodology or process framework mandated by the Project Management Office (PMO). They will not have people-management responsibilities for project team members. Agile practitioner As with all roles in today's world of tech, these categories will vary and overlap. They can even be held by the same person, which is becoming an increasingly common trait. They are also constantly evolving, which exemplifies the need to learn and grow continually, regardless of your role or position. If you are a true Agile practitioner, you may have issues in choosing these generalized categories, (Team Leader, Development Manager and Project Manager) and you'd be right to do so! These categories are most applicable to an organization that practises the traditional Waterfall model. Without diving into the everlasting Waterfall vs Agile debate, let's just say that these are the categories that transcend any methodologies. Even if they're not referred to by these names, they are the roles that need to be performed, to varying degrees, at various times. For completeness, it is worth noting one role specific to Agile, that is being a scrum master. Scrum master A scrum master is a role often compared – rightly or wrongly – with that of the project manager. The key difference is that their focus is on facilitation and coaching, instead of organizing and control. This difference is as much of a mindset as it is a strict practice, and is often referred to as being attributes of Servant Leadership. I believe a good scrum master will show traits of a good project manager at various times, and vice versa. This is especially true in ensuring that there is clear communication at all times and the team stays focused on delivering together. Yet, as we look back at all these roles, it's worth remembering that with the advent of new disciplines such as big data, blockchain, artificial intelligence, and machine learning, there are new categories and opportunities to move from a developer role into a management position, for example, as an algorithm manager or data manager. Transitioning, growing, progressing, or simply changing from a developer to a manager is a wonderfully rewarding journey that is unique to everyone. After clarifying what being a “modern manager" really means, and the broad categories applicable in software development (Team / Development / Project / Agile), the overarching and often key consideration for developers is whether it means they will be managing people and writing less code. In this article, we looked into different leadership roles that are available for developers for their career progression plan. Develop crucial skills to enhance your performance and advance your career with The Successful Software Manager written by Herman Fung. “Developers don’t belong on a pedestal, they’re doing a job like everyone else” – April Wensel on toxic tech culture and Compassionate Coding [Interview] Curl’s lead developer announces Google’s “plan to reimplement curl in Libcrurl” ‘I code in my dreams too’, say developers in Jetbrains State of Developer Ecosystem 2019 Survey

0
0
27962

article-image-delphi-memory-management-techniques-for-parallel-programming

Pavan Ramchandani

19 Jun 2018

31 min read

Delphi: memory management techniques for parallel programming

Pavan Ramchandani

19 Jun 2018

31 min read

0
0
27785

article-image-fine-tune-nginx-configufine-tune-nginx-configurationfine-tune-nginx-configurationratio

Packt

14 Jul 2015

20 min read

Fine-tune the NGINX Configuration

Packt

14 Jul 2015

20 min read

In this article by Rahul Sharma, author of the book NGINX High Performance, we will cover the following topics: NGINX configuration syntax Configuring NGINX workers Configuring NGINX I/O Configuring TCP Setting up the server (For more resources related to this topic, see here.) NGINX configuration syntax This section aims to cover it in good detail. The complete configuration file has a logical structure that is composed of directives grouped into a number of sections. A section defines the configuration for a particular NGINX module, for example, the http section defines the configuration for the ngx_http_core module. An NGINX configuration has the following syntax: Valid directives begin with a variable name and then state an argument or series of arguments separated by spaces. All valid directives end with a semicolon (;). Sections are defined with curly braces ({}). Sections can be nested in one another. The nested section defines a module valid under the particular section, for example, the gzip section under the http section. Configuration outside any section is part of the NGINX global configuration. The lines starting with the hash (#) sign are comments. Configurations can be split into multiple files, which can be grouped using the include directive. This helps in organizing code into logical components. Inclusions are processed recursively, that is, an include file can further have include statements. Spaces, tabs, and new line characters are not part of the NGINX configuration. They are not interpreted by the NGINX engine, but they help to make the configuration more readable. Thus, the complete file looks like the following code: #The configuration begins here global1 value1; #This defines a new section section { sectionvar1 value1; include file1; subsection { subsectionvar1 value1; } } #The section ends here global2 value2; # The configuration ends here NGINX provides the -t option, which can be used to test and verify the configuration written in the file. If the file or any of the included files contains any errors, it prints the line numbers causing the issue: $ sudo nginx -t This checks the validity of the default configuration file. If the configuration is written in a file other than the default one, use the -c option to test it. You cannot test half-baked configurations, for example, you defined a server section for your domain in a separate file. Any attempt to test such a file will throw errors. The file has to be complete in all respects. Now that we have a clear idea of the NGINX configuration syntax, we will try to play around with the default configuration. This article only aims to discuss the parts of the configuration that have an impact on performance. The NGINX catalog has large number of modules that can be configured for some purposes. This article does not try to cover all of them as the details are beyond the scope of the book. Please refer to the NGINX documentation at http://nginx.org/en/docs/ to know more about the modules. Configuring NGINX workers NGINX runs a fixed number of worker processes as per the specified configuration. In the following sections, we will work with NGINX worker parameters. These parameters are mostly part of the NGINX global context. worker_processes The worker_processes directive controls the number of workers: worker_processes 1; The default value for this is 1, that is, NGINX runs only one worker. The value should be changed to an optimal value depending on the number of cores available, disks, network subsystem, server load, and so on. As a starting point, set the value to the number of cores available. Determine the number of cores available using lscpu: $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 The same can be accomplished by greping out cpuinfo: $ cat /proc/cpuinfo | grep 'processor' | wc -l Now, set this value to the parameter: # One worker per CPU-core. worker_processes 4; Alternatively, the directive can have auto as its value. This determines the number of cores and spawns an equal number of workers. When NGINX is running with SSL, it is a good idea to have multiple workers. SSL handshake is blocking in nature and involves disk I/O. Thus, using multiple workers leads to improved performance. accept_mutex Since we have configured multiple workers in NGINX, we should also configure the flags that impact worker selection. The accept_mutex parameter available under the events section will enable each of the available workers to accept new connections one by one. By default, the flag is set to on. The following code shows this: events { accept_mutex on; } If the flag is turned to off, all of the available workers will wake up from the waiting state, but only one worker will process the connection. This results in the Thundering Herd phenomenon, which is repeated a number of times per second. The phenomenon causes reduced server performance as all the woken-up workers take up CPU time before going back to the wait state. This results in unproductive CPU cycles and nonutilized context switches. accept_mutex_delay When accept_mutex is enabled, only one worker, which has the mutex lock, accepts connections, while others wait for their turn. The accept_mutex_delay corresponds to the timeframe for which the worker would wait, and after which it tries to acquire the mutex lock and starts accepting new connections. The directive is available under the events section with a default value of 500 milliseconds. The following code shows this: events{ accept_mutex_delay 500ms; } worker_connections The next configuration to look at is worker_connections, with a default value of 512. The directive is present under the events section. The directive sets the maximum number of simultaneous connections that can be opened by a worker process. The following code shows this: events{ worker_connections 512; } Increase worker_connections to something like 1,024 to accept more simultaneous connections. The value of worker_connections does not directly translate into the number of clients that can be served simultaneously. Each browser opens a number of parallel connections to download various components that compose a web page, for example, images, scripts, and so on. Different browsers have different values for this, for example, IE works with two parallel connections while Chrome opens six connections. The number of connections also includes sockets opened with the upstream server, if any. worker_rlimit_nofile The number of simultaneous connections is limited by the number of file descriptors available on the system as each socket will open a file descriptor. If NGINX tries to open more sockets than the available file descriptors, it will lead to the Too many opened files message in the error.log. Check the number of file descriptors using ulimit: $ ulimit -n Now, increase this to a value more than worker_process * worker_connections. The value should be increased for the user that runs the worker process. Check the user directive to get the username. NGINX provides the worker_rlimit_nofile directive, which can be an alternative way of setting the available file descriptor rather modifying ulimit. Setting the directive will have a similar impact as updating ulimit for the worker user. The value of this directive overrides the ulimit value set for the user. The directive is not present by default. Set a large value to handle large simultaneous connections. The following code shows this: worker_rlimit_nofile 20960; To determine the OS limits imposed on a process, read the file /proc/$pid/limits. $pid corresponds to the PID of the process. multi_accept The multi_accept flag enables an NGINX worker to accept as many connections as possible when it gets the notification of a new connection. The purpose of this flag is to accept all connections in the listen queue at once. If the directive is disabled, a worker process will accept connections one by one. The following code shows this: events{ multi_accept on; } The directive is available under the events section with the default value off. If the server has a constant stream of incoming connections, enabling multi_accept may result in a worker accepting more connections than the number specified in worker_connections. The overflow will lead to performance loss as the previously accepted connections, part of the overflow, will not get processed. use NGINX provides several methods for connection processing. Each of the available methods allows NGINX workers to monitor multiple socket file descriptors, that is, when there is data available for reading/writing. These calls allow NGINX to process multiple socket streams without getting stuck in any one of them. The methods are platform-dependent, and the configure command, used to build NGINX, selects the most efficient method available on the platform. If we want to use other methods, they must be enabled first in NGINX. The use directive allows us to override the default method with the method specified. The directive is part of the events section: events { use select; } NGINX supports the following methods of processing connections: select: This is the standard method of processing connections. It is built automatically on platforms that lack more efficient methods. The module can be enabled or disabled using the --with-select_module or --without-select_module configuration parameter. poll: This is the standard method of processing connections. It is built automatically on platforms that lack more efficient methods. The module can be enabled or disabled using the --with-poll_module or --without-poll_module configuration parameter. kqueue: This is an efficient method of processing connections available on FreeBSD 4.1, OpenBSD 2.9+, NetBSD 2.0, and OS X. There are the additional directives kqueue_changes and kqueue_events. These directives specify the number of changes and events that NGINX will pass to the kernel. The default value for both of these is 512. The kqueue method will ignore the multi_accept directive if it has been enabled. epoll: This is an efficient method of processing connections available on Linux 2.6+. The method is similar to the FreeBSD kqueue. There is also the additional directive epoll_events. This specifies the number of events that NGINX will pass to the kernel. The default value for this is 512. /dev/poll: This is an efficient method of processing connections available on Solaris 7 11/99+, HP/UX 11.22+, IRIX 6.5.15+, and Tru64 UNIX 5.1A+. This has the additional directives, devpoll_events and devpoll_changes. The directives specify the number of changes and events that NGINX will pass to the kernel. The default value for both of these is 32. eventport: This is an efficient method of processing connections available on Solaris 10. The method requires necessary security patches to avoid kernel crash issues. rtsig: Real-time signals is a connection processing method available on Linux 2.2+. The method has some limitations. On older kernels, there is a system-wide limit of 1,024 signals. For high loads, the limit needs to be increased by setting the rtsig-max parameter. For kernel 2.6+, instead of the system-wide limit, there is a limit on the number of outstanding signals for each process. NGINX provides the worker_rlimit_sigpending parameter to modify the limit for each of the worker processes: worker_rlimit_sigpending 512; The parameter is part of the NGINX global configuration. If the queue overflows, NGINX drains the queue and uses the poll method to process the unhandled events. When the condition is back to normal, NGINX switches back to the rtsig method of connection processing. NGINX provides the rtsig_overflow_events, rtsig_overflow_test, and rtsig_overflow_threshold parameters to control how a signal queue is handled on overflows. The rtsig_overflow_events parameter defines the number of events passed to poll. The rtsig_overflow_test parameter defines the number of events handled by poll, after which NGINX will drain the queue. Before draining the signal queue, NGINX will look up how much it is filled. If the factor is larger than the specified rtsig_overflow_threshold, it will drain the queue. The rtsig method requires accept_mutex to be set. The method also enables the multi_accept parameter. Configuring NGINX I/O NGINX can also take advantage of the Sendfile and direct I/O options available in the kernel. In the following sections, we will try to configure parameters available for disk I/O. Sendfile When a file is transferred by an application, the kernel first buffers the data and then sends the data to the application buffers. The application, in turn, sends the data to the destination. The Sendfile method is an improved method of data transfer, in which data is copied between file descriptors within the OS kernel space, that is, without transferring data to the application buffers. This results in improved utilization of the operating system's resources. The method can be enabled using the sendfile directive. The directive is available for the http, server, and location sections. http{ sendfile on; } The flag is set to off by default. Direct I/O The OS kernel usually tries to optimize and cache any read/write requests. Since the data is cached within the kernel, any subsequent read request to the same place will be much faster because there's no need to read the information from slow disks. Direct I/O is a feature of the filesystem where reads and writes go directly from the applications to the disk, thus bypassing all OS caches. This results in better utilization of CPU cycles and improved cache effectiveness. The method is used in places where the data has a poor hit ratio. Such data does not need to be in any cache and can be loaded when required. It can be used to serve large files. The directio directive enables the feature. The directive is available for the http, server, and location sections: location /video/ { directio 4m; } Any file with size more than that specified in the directive will be loaded by direct I/O. The parameter is disabled by default. The use of direct I/O to serve a request will automatically disable Sendfile for the particular request. Direct I/O depends on the block size while doing a data transfer. NGINX has the directio_alignment directive to set the block size. The directive is present under the http, server, and location sections: location /video/ { directio 4m; directio_alignment 512; } The default value of 512 bytes works well for all boxes unless it is running a Linux implementation of XFS. In such a case, the size should be increased to 4 KB. Asynchronous I/O Asynchronous I/O allows a process to initiate I/O operations without having to block or wait for it to complete. The aio directive is available under the http, server, and location sections of an NGINX configuration. Depending on the section, the parameter will perform asynchronous I/O for the matching requests. The parameter works on Linux kernel 2.6.22+ and FreeBSD 4.3. The following code shows this: location /data { aio on; } By default, the parameter is set to off. On Linux, aio needs to be enabled with directio, while on FreeBSD, sendfile needs to be disabled for aio to take effect. If NGINX has not been configured with the --with-file-aio module, any use of the aio directive will cause the unknown directive aio error. The directive has a special value of threads, which enables multithreading for send and read operations. The multithreading support is only available on the Linux platform and can only be used with the epoll, kqueue, or eventport methods of processing requests. In order to use the threads value, configure multithreading in the NGINX binary using the --with-threads option. Post this, add a thread pool in the NGINX global context using the thread_pool directive. Use the same pool in the aio configuration: thread_pool io_pool threads=16; http{ …..... location /data{ sendfile on; aio threads=io_pool; } } Mixing them up The three directives can be mixed together to achieve different objectives on different platforms. The following configuration will use sendfile for files with size smaller than what is specified in directio. Files served by directio will be read using asynchronous I/O: location /archived-data/{ sendfile on; aio on; directio 4m; } The aio directive has a sendfile value, which is available only on the FreeBSD platform. The value can be used to perform Sendfile in an asynchronous manner: location /archived-data/{ sendfile on; aio sendfile; } NGINX invokes the sendfile() system call, which returns with no data in the memory. Post this, NGINX initiates data transfer in an asynchronous manner. Configuring TCP HTTP is an application-based protocol, which uses TCP as the transport layer. In TCP, data is transferred in the form of blocks known as TCP packets. NGINX provides directives to alter the behavior of the underlying TCP stack. These parameters alter flags for an individual socket connection. TCP_NODELAY TCP/IP networks have the "small packet" problem, where single-character messages can cause network congestion on a highly loaded network. Such packets are 41 bytes in size, where 40 bytes are for the TCP header and 1 byte has useful information. These small packets have huge overhead, around 4000 percent and can saturate a network. John Nagle solved the problem (Nagle's algorithm) by not sending the small packets immediately. All such packets are collected for some amount of time and then sent in one go as a single packet. This results in improved efficiency of the underlying network. Thus, a typical TCP/IP stack waits for up to 200 milliseconds before sending the data packages to the client. It is important to note that the problem exists with applications such as Telnet, where each keystroke is sent over wire. The problem is not relevant to a web server, which severs static files. The files will mostly form full TCP packets, which can be sent immediately instead of waiting for 200 milliseconds. The TCP_NODELAY option can be used while opening a socket to disable Nagle's buffering algorithm and send the data as soon as it is available. NGINX provides the tcp_nodelay directive to enable this option. The directive is available under the http, server, and location sections of an NGINX configuration: http{ tcp_nodelay on; } The directive is enabled by default. NGINX use tcp_nodelay for connections with the keep-alive mode. TCP_CORK As an alternative to Nagle's algorithm, Linux provides the TCP_CORK option. The option tells the TCP stack to append packets and send them when they are full or when the application instructs to send the packet by explicitly removing TCP_CORK. This results in an optimal amount of data packets being sent and, thus, improves the efficiency of the network. The TCP_CORK option is available as the TCP_NOPUSH flag on FreeBSD and Mac OS. NGINX provides the tcp_nopush directive to enable TCP_CORK over the connection socket. The directive is available under the http, server, and location sections of an NGINX configuration: http{ tcp_nopush on; } The directive is disabled by default. NGINX uses tcp_nopush for requests served with sendfile. Setting them up The two directives discussed previously do mutually exclusive things; the former makes sure that the network latency is reduced, while the latter tries to optimize the data packets sent. An application should set both of these options to get efficient data transfer. Enabling tcp_nopush along with sendfile makes sure that while transferring a file, the kernel creates the maximum amount of full TCP packets before sending them over wire. The last packet(s) can be partial TCP packets, which could end up waiting with TCP_CORK being enabled. NGINX make sure it removes TCP_CORK to send these packets. Since tcp_nodelay is also set then, these packets are immediately sent over the network, that is, without any delay. Setting up the server The following configuration sums up all the changes proposed in the preceding sections: worker_processes 3; worker_rlimit_nofile 8000; events { multi_accept on; use epoll; worker_connections 1024; } http { sendfile on; aio on; directio 4m; tcp_nopush on; tcp_nodelay on; # Rest Nginx configuration removed for brevity } It is assumed that NGINX runs on a quad core server. Thus three worker processes have been spanned to take advantage of three out of four available cores and leaving one core for other processes. Each of the workers has been configured to work with 1,024 connections. Correspondingly, the nofile limit has been increased to 8,000. By default, all worker processes operate with mutex; thus, the flag has not been set. Each worker processes multiple connections in one go using the epoll method. In the http section, NGINX has been configured to serve files larger than 4 MB using direct I/O, while efficiently buffering smaller files using Sendfile. TCP options have also been set up to efficiently utilize the available network. Measuring gains It is time to test the changes and make sure that they have given performance gain. Run a series of tests using Siege/JMeter to get new performance numbers. The tests should be performed with the same configuration to get a comparable output: $ siege -b -c 790 -r 50 -q http://192.168.2.100/hello Transactions: 79000 hits Availability: 100.00 % Elapsed time: 24.25 secs Data transferred: 12.54 MB Response time: 0.20 secs Transaction rate: 3257.73 trans/sec Throughput: 0.52 MB/sec Concurrency: 660.70 Successful transactions: 39500 Failed transactions: 0 Longest transaction: 3.45 Shortest transaction: 0.00 The results from Siege should be evaluated and compared to the baseline. Throughput: The transaction rate defines this as 3250 requests/second Error rate: Availability is reported as 100 percent; thus; the error rate is 0 percent Response time: The results shows a response time of 0.20 seconds Thus, these new numbers demonstrate performance improvement in various respects. After the server configuration is updated with all the changes, reperform all tests with increased numbers. The aim should be to determine the new baseline numbers for the updated configuration. Summary The article started with an overview of the NGINX configuration syntax. Going further, we discussed worker_connections and the related parameters. These allow you to take advantage of the available hardware. The article also talked about the different event processing mechanisms available on different platforms. The configuration discussed helped in processing more requests, thus improving the overall throughput. NGINX is primarily a web server; thus, it has to serve all kinds static content. Large files can take advantage of direct I/O, while smaller content can take advantage of Sendfile. The different disk modes make sure that we have an optimal configuration to serve the content. In the TCP stack, we discussed the flags available to alter the default behavior of TCP sockets. The tcp_nodelay directive helps in improving latency. The tcp_nopush directive can help in efficiently delivering the content. Both these flags lead to improved response time. In the last part of the article, we applied all the changes to our server and then did performance tests to determine the effectiveness of the changes done. In the next article, we will try to configure buffers, timeouts, and compression to improve the utilization of the available network. Resources for Article: Further resources on this subject: Using Nginx as a Reverse Proxy [article] Nginx proxy module [article] Introduction to nginx [article]

0
0
27721

article-image-node-js-13-releases-upgraded-v8-full-icu-support-stable-worker-threads-api

Fatema Patrawala

23 Oct 2019

4 min read

Node.js 13 releases with an upgraded V8, full ICU support, stable Worker Threads API and more

Fatema Patrawala

23 Oct 2019

4 min read

Yesterday was a super exciting day for Node.js developers as Node.js foundation announced of Node.js 12 transitions to Long Term Support (LTS) with the release of Node.js 13. As per the team, Node.js 12 becomes the newest LTS release along with version 10 and 8. This release marks the transition of Node.js 12.x into LTS with the codename 'Erbium'. The 12.x release line now moves into "Active LTS" and will remain so until October 2020. Then it will move into "Maintenance" until the end of life in April 2022. The new Node.js 13 release will deliver faster startup and better default heap limits. It includes updates to V8, TLS and llhttp and new features like diagnostic report, bundled heap dump capability and updates to Worker Threads, N-API, and more. Key features in Node.js 13 Let us take a look at the key features included in Node.js 13. V8 gets an upgrade to V8 7.8 This release is compatible with the new version V8 7.8. This new version of the V8 JavaScript engine brings performance tweaks and improvements to keep Node.js up with the ongoing improvements in the language and runtime. Full ICU enabled by default in Node.js 13 As of Node.js 13, full-icu is now available as default, which means hundreds of other local languages are now supported out of the box. This will simplify development and deployment of applications for non-English deployments. Stable workers API Worker Threads API is now a stable feature in both Node.js 12 and Node.js 13. While Node.js already performs well with the single-threaded event loop, there are some use-cases where additional threads can be leveraged for better results. New compiler and platform support Node.js and V8 continue to embrace newer C++ features and take advantage of newer compiler optimizations and security enhancements. With the release of Node.js 13, the codebase will now require a minimum of version 10 for the OS X development tools and version 7.2 of the AIX operating system. In addition to this there has been progress on supporting Python 3 for building Node.js applications. Systems that have Python 2 and Python 3 installed will still be able to use Python 2, however, systems with only Python 3 should now be able to build using Python 3. Developers discuss pain points in Node.js 13 On Hacker News, users discuss various pain-points in Node.js 13 and some of the functionalities missing in this release. One of the users commented, “To save you the clicks: Node.js 13 doesn't support top-level await. Node includes V8 7.8, released Sep 27. Top-level await merged into V8 on Sep 24, but didn't make it in time for the 7.8 release.” Response on this comment came in from V8 team, they say, “TLA is only in modules. Once node supports modules, it will also have TLA. We're also pushing out a version with 7.9 fairly soonish.” Other users discussed how Node.js performs with TypeScript, “I've been using node with typescript and it's amazing. VERY productive. The key thing is you can do a large refactoring without breaking anything. The biggest challenge I have right now is actually the tooling. Intellij tends to break sometimes. I'm using lerna for a monorepo with sub-modules and it's buggy with regular npm. For example 'npm audit' doesn't work. I might have to migrate to yarn…” If you are interested to know more about this release, check out the official Node.js blog post as well as the GitHub page for release notes. The OpenJS Foundation accepts NVM as its first new incubating project since the Node.js Foundation and JSF merger 12 Visual Studio Code extensions that Node.js developers will love [Sponsored by Microsoft] 5 reasons Node.js developers might actually love using Azure [Sponsored by Microsoft] Introducing Node.js 12 with V8 JavaScript engine, improved worker threads, and much more Google is planning to bring Node.js support to Fuchsia

0
0
27650

article-image-extension-functions-in-kotlin

Aaron Lazar

08 Jun 2018

8 min read

Extension functions in Kotlin: everything you need to know

Aaron Lazar

08 Jun 2018

8 min read

0
0
27625

Packt

04 Mar 2015

22 min read

Python functions – Avoid repeating code

Packt

04 Mar 2015

22 min read

In this article by Silas Toms, author of the book ArcPy and ArcGIS – Geospatial Analysis with Python we will see how programming languages share a concept that has aided programmers for decades: functions. The idea of a function, loosely speaking, is to create blocks of code that will perform an action on a piece of data, transforming it as required by the programmer and returning the transformed data back to the main body of code. Functions are used because they solve many different needs within programming. Functions reduce the need to write repetitive code, which in turn reduces the time needed to create a script. They can be used to create ranges of numbers (the range() function), or to determine the maximum value of a list (the max function), or to create a SQL statement to select a set of rows from a feature class. They can even be copied and used in another script or included as part of a module that can be imported into scripts. Function reuse has the added bonus of making programming more useful and less of a chore. When a scripter starts writing functions, it is a major step towards making programming part of a GIS workflow. (For more resources related to this topic, see here.) Technical definition of functions Functions, also called subroutines or procedures in other programming languages, are blocks of code that have been designed to either accept input data and transform it, or provide data to the main program when called without any input required. In theory, functions will only transform data that has been provided to the function as a parameter; it should not change any other part of the script that has not been included in the function. To make this possible, the concept of namespaces is invoked. Namespaces make it possible to use a variable name within a function, and allow it to represent a value, while also using the same variable name in another part of the script. This becomes especially important when importing modules from other programmers; within that module and its functions, the variables that it contains might have a variable name that is the same as a variable name within the main script. In a high-level programming language such as Python, there is built-in support for functions, including the ability to define function names and the data inputs (also known as parameters). Functions are created using the keyword def plus a function name, along with parentheses that may or may not contain parameters. Parameters can also be defined with default values, so parameters only need to be passed to the function when they differ from the default. The values that are returned from the function are also easily defined. A first function Let's create a function to get a feel for what is possible when writing functions. First, we need to invoke the function by providing the def keyword and providing a name along with the parentheses. The firstFunction() will return a string when called: def firstFunction(): 'a simple function returning a string' return "My First Function" >>>firstFunction() The output is as follows: 'My First Function' Notice that this function has a documentation string or doc string (a simple function returning a string) that describes what the function does; this string can be called later to find out what the function does, using the __doc__ internal function: >>>print firstFunction.__doc__ The output is as follows: 'a simple function returning a string' The function is defined and given a name, and then the parentheses are added followed by a colon. The following lines must then be indented (a good IDE will add the indention automatically). The function does not have any parameters, so the parentheses are empty. The function then uses the keyword return to return a value, in this case a string, from the function. Next, the function is called by adding parentheses to the function name. When it is called, it will do what it has been instructed to do: return the string. Functions with parameters Now let's create a function that accepts parameters and transforms them as needed. This function will accept a number and multiply it by 3: def secondFunction(number): 'this function multiples numbers by 3' return number *3 >>> secondFunction(4) The output is as follows: 12 The function has one flaw, however; there is no assurance that the value passed to the function is a number. We need to add a conditional to the function to make sure it does not throw an exception: def secondFunction(number): 'this function multiples numbers by 3' if type(number) == type(1) or type(number) == type(1.0): return number *3 >>> secondFunction(4.0) The output is as follows: 12.0 >>>secondFunction(4) The output is as follows: 12 >>>secondFunction("String") >>> The function now accepts a parameter, checks what type of data it is, and returns a multiple of the parameter whether it is an integer or a function. If it is a string or some other data type, as shown in the last example, no value is returned. There is one more adjustment to the simple function that we should discuss: parameter defaults. By including default values in the definition of the function, we avoid having to provide parameters that rarely change. If, for instance, we wanted a different multiplier than 3 in the simple function, we would define it like this: def thirdFunction(number, multiplier=3): 'this function multiples numbers by 3' if type(number) == type(1) or type(number) == type(1.0): return number *multiplier >>>thirdFunction(4) The output is as follows: 12 >>>thirdFunction(4,5) The output is as follows: 20 The function will work when only the number to be multiplied is supplied, as the multiplier has a default value of 3. However, if we need another multiplier, the value can be adjusted by adding another value when calling the function. Note that the second value doesn't have to be a number as there is no type checking on it. Also, the default value(s) in a function must follow the parameters with no defaults (or all parameters can have a default value and the parameters can be supplied to the function in order or by name). Using functions to replace repetitive code One of the main uses of functions is to ensure that the same code does not have to be written over and over. The first portion of the script that we could convert into a function is the three ArcPy functions. Doing so will allow the script to be applicable to any of the stops in the Bus Stop feature class and have an adjustable buffer distance: bufferDist = 400 buffDistUnit = "Feet" lineName = '71 IB' busSignage = 'Ferry Plaza' sqlStatement = "NAME = '{0}' AND BUS_SIGNAG = '{1}'" def selectBufferIntersect(selectIn,selectOut,bufferOut, intersectIn, intersectOut, sqlStatement, bufferDist, buffDistUnit, lineName, busSignage): 'a function to perform a bus stop analysis' arcpy.Select_analysis(selectIn, selectOut, sqlStatement.format(lineName, busSignage)) arcpy.Buffer_analysis(selectOut, bufferOut, "{0} {1}".format(bufferDist), "FULL", "ROUND", "NONE", "") arcpy.Intersect_analysis("{0} #;{1} #".format(bufferOut, intersectIn), intersectOut, "ALL", "", "INPUT") return intersectOut This function demonstrates how the analysis can be adjusted to accept the input and output feature class variables as parameters, along with some new variables. The function adds a variable to replace the SQL statement and variables to adjust the bus stop, and also tweaks the buffer distance statement so that both the distance and the unit can be adjusted. The feature class name variables, defined earlier in the script, have all been replaced with local variable names; while the global variable names could have been retained, it reduces the portability of the function. The next function will accept the result of the selectBufferIntersect() function and search it using the Search Cursor, passing the results into a dictionary. The dictionary will then be returned from the function for later use: def createResultDic(resultFC): 'search results of analysis and create results dictionary' dataDictionary = {} with arcpy.da.SearchCursor(resultFC, ["STOPID","POP10"]) as cursor: for row in cursor: busStopID = row[0] pop10 = row[1] if busStopID not in dataDictionary.keys(): dataDictionary[busStopID] = [pop10] else: dataDictionary[busStopID].append(pop10) return dataDictionary This function only requires one parameter: the feature class returned from the searchBufferIntersect() function. The results holding dictionary is first created, then populated by the search cursor, with the busStopid attribute used as a key, and the census block population attribute added to a list assigned to the key. The dictionary, having been populated with sorted data, is returned from the function for use in the final function, createCSV(). This function accepts the dictionary and the name of the output CSV file as a string: def createCSV(dictionary, csvname): 'a function takes a dictionary and creates a CSV file' with open(csvname, 'wb') as csvfile: csvwriter = csv.writer(csvfile, delimiter=',') for busStopID in dictionary.keys(): popList = dictionary[busStopID] averagePop = sum(popList)/len(popList) data = [busStopID, averagePop] csvwriter.writerow(data) The final function creates the CSV using the csv module. The name of the file, a string, is now a customizable parameter (meaning the script name can be any valid file path and text file with the extension .csv). The csvfile parameter is passed to the CSV module's writer method and assigned to the variable csvwriter, and the dictionary is accessed and processed, and passed as a list to csvwriter to be written to the CSV file. The csv.writer() method processes each item in the list into the CSV format and saves the final result. Open the CSV file with Excel or a text editor such as Notepad. To run the functions, we will call them in the script following the function definitions: analysisResult = selectBufferIntersect(Bus_Stops,Inbound71, Inbound71_400ft_buffer, CensusBlocks2010, Intersect71Census, bufferDist, lineName, busSignage ) dictionary = createResultDic(analysisResult) createCSV(dictionary,r'C:\Projects\Output\Averages.csv') Now, the script has been divided into three functions, which replace the code of the first modified script. The modified script looks like this: # -*- coding: utf-8 -*- # --------------------------------------------------------------------------- # 8662_Chapter4Modified1.py # Created on: 2014-04-22 21:59:31.00000 # (generated by ArcGIS/ModelBuilder) # Description: # Adjusted by Silas Toms # 2014 05 05 # --------------------------------------------------------------------------- # Import arcpy module import arcpy import csv # Local variables: Bus_Stops = r"C:\Projects\PacktDB.gdb\SanFrancisco\Bus_Stops" CensusBlocks2010 = r"C:\Projects\PacktDB.gdb\SanFrancisco\CensusBlocks2010" Inbound71 = r"C:\Projects\PacktDB.gdb\Chapter3Results\Inbound71" Inbound71_400ft_buffer = r"C:\Projects\PacktDB.gdb\Chapter3Results\Inbound71_400ft_buffer" Intersect71Census = r"C:\Projects\PacktDB.gdb\Chapter3Results\Intersect71Census" bufferDist = 400 lineName = '71 IB' busSignage = 'Ferry Plaza' def selectBufferIntersect(selectIn,selectOut,bufferOut,intersectIn, intersectOut, bufferDist,lineName, busSignage ): arcpy.Select_analysis(selectIn, selectOut, "NAME = '{0}' AND BUS_SIGNAG = '{1}'".format(lineName, busSignage)) arcpy.Buffer_analysis(selectOut, bufferOut, "{0} Feet".format(bufferDist), "FULL", "ROUND", "NONE", "") arcpy.Intersect_analysis("{0} #;{1} #".format(bufferOut,intersectIn), intersectOut, "ALL", "", "INPUT") return intersectOut def createResultDic(resultFC): dataDictionary = {} with arcpy.da.SearchCursor(resultFC, ["STOPID","POP10"]) as cursor: for row in cursor: busStopID = row[0] pop10 = row[1] if busStopID not in dataDictionary.keys(): dataDictionary[busStopID] = [pop10] else: dataDictionary[busStopID].append(pop10) return dataDictionary def createCSV(dictionary, csvname): with open(csvname, 'wb') as csvfile: csvwriter = csv.writer(csvfile, delimiter=',') for busStopID in dictionary.keys(): popList = dictionary[busStopID] averagePop = sum(popList)/len(popList) data = [busStopID, averagePop] csvwriter.writerow(data) analysisResult = selectBufferIntersect(Bus_Stops,Inbound71, Inbound71_400ft_buffer,CensusBlocks2010,Intersect71Census, bufferDist,lineName, busSignage ) dictionary = createResultDic(analysisResult) createCSV(dictionary,r'C:\Projects\Output\Averages.csv') print "Data Analysis Complete" Further generalization of the functions, while we have created functions from the original script that can be used to extract more data about bus stops in San Francisco, our new functions are still very specific to the dataset and analysis for which they were created. This can be very useful for long and laborious analysis for which creating reusable functions is not necessary. The first use of functions is to get rid of the need to repeat code. The next goal is to then make that code reusable. Let's discuss some ways in which we can convert the functions from one-offs into reusable functions or even modules. First, let's examine the first function: def selectBufferIntersect(selectIn,selectOut,bufferOut,intersectIn, intersectOut, bufferDist,lineName, busSignage ): arcpy.Select_analysis(selectIn, selectOut, "NAME = '{0}' AND BUS_SIGNAG = '{1}'".format(lineName, busSignage)) arcpy.Buffer_analysis(selectOut, bufferOut, "{0} Feet".format(bufferDist), "FULL", "ROUND", "NONE", "") arcpy.Intersect_analysis("{0} #;{1} #".format(bufferOut,intersectIn), intersectOut, "ALL", "", "INPUT") return intersectOut This function appears to be pretty specific to the bus stop analysis. It's so specific, in fact, that while there are a few ways in which we can tweak it to make it more general (that is, useful in other scripts that might not have the same steps involved), we should not convert it into a separate function. When we create a separate function, we introduce too many variables into the script in an effort to simplify it, which is a counterproductive effort. Instead, let's focus on ways to generalize the ArcPy tools themselves. The first step will be to split the three ArcPy tools and examine what can be adjusted with each of them. The Select tool should be adjusted to accept a string as the SQL select statement. The SQL statement can then be generated by another function or by parameters accepted at runtime. For instance, if we wanted to make the script accept multiple bus stops for each run of the script (for example, the inbound and outbound stops for each line), we could create a function that would accept a list of the desired stops and a SQL template, and would return a SQL statement to plug into the Select tool. Here is an example of how it would look: def formatSQLIN(dataList, sqlTemplate): 'a function to generate a SQL statement' sql = sqlTemplate #"OBJECTID IN " step = "(" for data in dataList: step += str(data) sql += step + ")" return sql def formatSQL(dataList, sqlTemplate): 'a function to generate a SQL statement' sql = '' for count, data in enumerate(dataList): if count != len(dataList)-1: sql += sqlTemplate.format(data) + ' OR ' else: sql += sqlTemplate.format(data) return sql >>> dataVals = [1,2,3,4] >>> sqlOID = "OBJECTID = {0}" >>> sql = formatSQL(dataVals, sqlOID) >>> print sql The output is as follows: OBJECTID = 1 OR OBJECTID = 2 OR OBJECTID = 3 OR OBJECTID = 4 This new function, formatSQL(), is a very useful function. Let's review what it does by comparing the function to the results following it. The function is defined to accept two parameters: a list of values and a SQL template. The first local variable is the empty string sql, which will be added to using string addition. The function is designed to insert the values into the variable sql, creating a SQL statement by taking the SQL template and using string formatting to add them to the template, which in turn is added to the SQL statement string (note that sql += is equivelent to sql = sql +). Also, an operator (OR) is used to make the SQL statement inclusive of all data rows that match the pattern. This function uses the built-in enumerate function to count the iterations of the list; once it has reached the last value in the list, the operator is not added to the SQL statement. Note that we could also add one more parameter to the function to make it possible to use an AND operator instead of OR, while still keeping OR as the default: def formatSQL2(dataList, sqlTemplate, operator=" OR "): 'a function to generate a SQL statement' sql = '' for count, data in enumerate(dataList): if count != len(dataList)-1: sql += sqlTemplate.format(data) + operator else: sql += sqlTemplate.format(data) return sql >>> sql = formatSQL2(dataVals, sqlOID," AND ") >>> print sql The output is as follows: OBJECTID = 1 AND OBJECTID = 2 AND OBJECTID = 3 AND OBJECTID = 4 While it would make no sense to use an AND operator on ObjectIDs, there are other cases where it would make sense, hence leaving OR as the default while allowing for AND. Either way, this function can now be used to generate our bus stop SQL statement for multiple stops (ignoring, for now, the bus signage field): >>> sqlTemplate = "NAME = '{0}'" >>> lineNames = ['71 IB','71 OB'] >>> sql = formatSQL2(lineNames, sqlTemplate) >>> print sql The output is as follows: NAME = '71 IB' OR NAME = '71 OB' However, we can't ignore the Bus Signage field for the inbound line, as there are two starting points for the line, so we will need to adjust the function to accept multiple values: def formatSQLMultiple(dataList, sqlTemplate, operator=" OR "): 'a function to generate a SQL statement' sql = '' for count, data in enumerate(dataList): if count != len(dataList)-1: sql += sqlTemplate.format(*data) + operator else: sql += sqlTemplate.format(*data) return sql >>> sqlTemplate = "(NAME = '{0}' AND BUS_SIGNAG = '{1}')" >>> lineNames = [('71 IB', 'Ferry Plaza'),('71 OB','48th Avenue')] >>> sql = formatSQLMultiple(lineNames, sqlTemplate) >>> print sql The output is as follows: (NAME = '71 IB' AND BUS_SIGNAG = 'Ferry Plaza') OR (NAME = '71 OB' AND BUS_SIGNAG = '48th Avenue') The slight difference in this function, the asterisk before the data variable, allows the values inside the data variable to be correctly formatted into the SQL template by exploding the values within the tuple. Notice that the SQL template has been created to segregate each conditional by using parentheses. The function(s) are now ready for reuse, and the SQL statement is now ready for insertion into the Select tool: sql = formatSQLMultiple(lineNames, sqlTemplate) arcpy.Select_analysis(Bus_Stops, Inbound71, sql) Next up is the Buffer tool. We have already taken steps towards making it generalized by adding a variable for the distance. In this case, we will only add one more variable to it, a unit variable that will make it possible to adjust the buffer unit from feet to meter or any other allowed unit. We will leave the other defaults alone. Here is an adjusted version of the Buffer tool: bufferDist = 400 bufferUnit = "Feet" arcpy.Buffer_analysis(Inbound71, Inbound71_400ft_buffer, "{0} {1}".format(bufferDist, bufferUnit), "FULL", "ROUND", "NONE", "") Now, both the buffer distance and buffer unit are controlled by a variable defined in the previous script, and this will make it easily adjustable if it is decided that the distance was not sufficient and the variables might need to be adjusted. The next step towards adjusting the ArcPy tools is to write a function, which will allow for any number of feature classes to be intersected together using the Intersect tool. This new function will be similar to the formatSQL functions as previous, as they will use string formatting and addition to allow for a list of feature classes to be processed into the correct string format for the Intersect tool to accept them. However, as this function will be built to be as general as possible, it must be designed to accept any number of feature classes to be intersected: def formatIntersect(features): 'a function to generate an intersect string' formatString = '' for count, feature in enumerate(features): if count != len(features)-1: formatString += feature + " #;" else: formatString += feature + " #" return formatString >>> shpNames = ["example.shp","example2.shp"] >>> iString = formatIntersect(shpNames) >>> print iString The output is as follows: example.shp #;example2.shp # Now that we have written the formatIntersect() function, all that needs to be created is a list of the feature classes to be passed to the function. The string returned by the function can then be passed to the Intersect tool: intersected = [Inbound71_400ft_buffer, CensusBlocks2010] iString = formatIntersect(intersected) # Process: Intersect arcpy.Intersect_analysis(iString, Intersect71Census, "ALL", "", "INPUT") Because we avoided creating a function that only fits this script or analysis, we now have two (or more) useful functions that can be applied in later analyses, and we know how to manipulate the ArcPy tools to accept the data that we want to supply to them. Summary In this article, we discussed how to take autogenerated code and make it generalized, while adding functions that can be reused in other scripts and will make the generation of the necessary code components, such as SQL statements, much easier. Resources for Article: Further resources on this subject: Enterprise Geodatabase [article] Adding Graphics to the Map [article] Image classification and feature extraction from images [article]

0
0
27288

article-image-python-3-8-new-features-the-walrus-operator-positional-only-parameters-and-much-more

Bhagyashree R

18 Jul 2019

5 min read

Python 3.8 new features: the walrus operator, positional-only parameters, and much more

Bhagyashree R

18 Jul 2019

5 min read

Earlier this month, the team behind Python announced the release of Python 3.8b2, the second of four planned beta releases. Ahead of the third beta release, which is scheduled for 29th July, we look at some of the key features coming to Python 3.8. The "incredibly controversial" walrus operator The walrus operator was proposed in PEP 572 (Assignment Expressions) by Chris Angelico, Tim Peters, and Guido van Rossum last year. Since then it has been heavily discussed in the Python community with many questioning whether it is a needed improvement. Others were excited as the operator does make the code a tiny bit more readable. At the end of the PEP discussion, Guido van Rossum stepped down as BDFL (benevolent dictator for life) and the creation of a new governance model. In an interview with InfoWorld, Guido shared, “The straw that broke the camel’s back was a very contentious Python enhancement proposal, where after I had accepted it, people went to social media like Twitter and said things that really hurt me personally. And some of the people who said hurtful things were actually core Python developers, so I felt that I didn’t quite have the trust of the Python core developer team anymore.” According to PEP 572, the assignment expression is a syntactical operator that allows you to assign values to a variable as a part of an expression. Its aim is to simplify things like multiple-pattern matches and the so-called loop and a half. At PyCon 2019, Dustin Ingram, a PyPI maintainer, gave a few examples where you can use this syntax: Balancing lines of codes and complexity Avoiding inefficient comprehensions Avoiding unnecessary variables in scope You can watch the full talk on YouTube: https://www.youtube.com/watch?v=6uAvHOKofws The feature was implemented by Emily Morehouse, Python core developer and Founder, Director of Engineering at Cuttlesoft, and was merged earlier this year: https://twitter.com/emilyemorehouse/status/1088593522142339072 Explaining other improvements this feature brings, Jake Edge, a contributor on LWN.net wrote, “These and other uses (e.g. in list and dict comprehensions) help make the intent of the programmer clearer. It is a feature that many other languages have, but Python has, of course, gone without it for nearly 30 years at this point. In the end, it is actually a fairly small change for all of the uproars it caused.” Positional-only parameters Proposed in PEP 570, this introduces a new syntax (/) to specify positional-only parameters in Python function definitions. This is similar to how * indicates that the arguments to its right are keyword only. This syntax is already used by many CPython built-in and standard library functions, for instance, the pow() function: pow(x, y, z=None, /) This syntax gives library authors more control over better expressing the intended usage of an API and allows the API to “evolve in a safe, backward-compatible way.” It gives library authors the flexibility to change the name of positional-only parameters without breaking callers. Additionally, this also ensures consistency of the Python language with existing documentation and the behavior of various "builtin" and standard library functions. As with PEP 572, this proposal also got mixed reactions from Python developers. In support, one developer said, “Position-only parameters already exist in cpython builtins like range and min. Making their support at the language level would make their existence less confusing and documented.” While others think that this will allow authors to “dictate” how their methods could be used. “Not the biggest fan of this one because it allows library authors to overly dictate how their functions can be used, as in, mark an argument as positional merely because they want to. But cool all the same,” a Redditor commented. Debug support for f-strings Formatted strings (f-strings) were introduced in Python 3.6 with PEP 498. It enables you to evaluate an expression as part of the string along with inserting the result of function calls and so on. In Python 3.8, some additional syntax changes have been made by adding add (=) specifier and a !d conversion for ease of debugging. You can use this feature like this: print(f'{foo=} {bar=}') This provides developers a better way of doing “print-style debugging”, especially for those who have a background in languages that already have such feature such as Perl, Ruby, JavaScript, etc. One developer expressed his delight on Hacker News, “F strings are pretty awesome. I’m coming from JavaScript and partially java background. JavaScript’s String concatenation can become too complex and I have difficulty with large strings.” Python Initialization Configuration Though Python is highly configurable, its configuration seems scattered all around the code. The PEP 587 introduces a new C API to configure the Python Initialization giving developers finer control over the configuration and better error reporting. Among the improvements, this API will bring include ability to read and modify configuration before it is applied and overriding how Python computes the module search paths (``sys.path``). Along with these, there are many other exciting features coming to Python 3.8, which is currently scheduled for October, including a fast calling protocol for CPython, Vectorcall, support for out-of-band buffers in pickle protocol 5, and more. You can find the full list on Python’s official website. Python serious about diversity, dumps offensive ‘master’, ‘slave’ terms in its documentation Introducing PyOxidizer, an open source utility for producing standalone Python applications, written in Rust Python 3.8 beta 1 is now ready for you to test

0
0
26970

How-To Tutorials - Programming

Is Golang truly community driven and does it really matter?

Understanding Go Internals: defer, panic() and recover() functions [Tutorial]

Microsoft mulls replacing C and C++ code with Rust calling it a "modern safer system programming language" with great memory safety features

How to Build 12 Factor Microservices on Docker - Part 2

How to implement immutability functions in Kotlin [Tutorial]

Processing Next-generation Sequencing Datasets Using Python

Writing Your First Cucumber Appium Test

9 reasons why Rust programmers love Rust

Are you looking at transitioning from being a developer to manager? Here are some leadership roles to consider

Delphi: memory management techniques for parallel programming

Trending Topics

Fine-tune the NGINX Configuration

Node.js 13 releases with an upgraded V8, full ICU support, stable Worker Threads API and more

Extension functions in Kotlin: everything you need to know

Python functions – Avoid repeating code

Python 3.8 new features: the walrus operator, positional-only parameters, and much more

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access