Reader small image

You're reading from  Hands-On Data Science with the Command Line

Product typeBook
Published inJan 2019
Reading LevelIntermediate
PublisherPackt
ISBN-139781789132984
Edition1st Edition
Languages
Tools
Concepts
Right arrow
Authors (3):
Jason Morris
Jason Morris
author image
Jason Morris

Jason Morris is a systems and research engineer with over 19 years of experience in system architecture, research engineering, and large data analysis. His primary focus is machine learning with TensorFlow, CUDA, and Apache Spark. Jason is also a speaker and a consultant for designing large-scale architectures, implementing best security practices on the cloud, creating near real-time image detection analytics with deep learning, and developing serverless architectures to aid in ETL. His most recent roles include solution architect, big data engineer, big data specialist, and instructor at Amazon Web Services. He is currently the Chief Technology Officer of Next Rev Technologies and his favorite command line program is netcat
Read more about Jason Morris

Chris McCubbin
Chris McCubbin
author image
Chris McCubbin

Chris McCubbin is a data scientist and software developer with 20 years experience in developing complex systems and analytics. He co-founded the successful big data security startup Sqrrl, since acquired by Amazon. He has also developed smart swarming systems for drones, social network analysis systems in MapReduce and big data security analytic platforms using the Apache projects Accumulo and Spark. He has been using the Unix command line starting on IRIX platforms in college and his favorite command line program is find.
Read more about Chris McCubbin

Raymond Page
Raymond Page
author image
Raymond Page

Raymond Page is a computer engineer specializing in site reliability. His experience with embedded development engendered a passion for removing the pervasive bloat from web technologies and cloud computing. His favorite command is cat.
Read more about Raymond Page

View More author details
Right arrow

Bash Functions and Data Visualization

So far, we've been working with bash interactively and had to rely on the bash history for what we've done. Wouldn't it be nice if you had a portable way to share and store the commands you want to run? Well, that functionality exists in the form of shell scripts composed of shell functions.

We're going to extend the history we covered in Chapter 1, Data Science at the Command Line and Setting It Up. Terminals originated as text-only devices and evolved graphical support for simple drawing primitives, such as rendering enclosed cells in tabular data. The pinnacle of Terminal graphics was made by DEC with canvas and vector-graphic support in the form of SIXEL and REGIS graphics, respectively. As physical Terminals became a thing of the past, lightweight Terminal emulators regressed to being text-only. A renaissance in graphics...

My first shell script

Our first shell script will cover the basics of how to tell the computer to run the shell script.

She bangs, she bangs!

We're not talking about that popular Ricky Martin song. We're talking about what every bash script needs in order to run. If you've worked with other programming languages, you may have noticed the first line always starts with a #!. This tells the system which interpreter to use. For example, if you've worked with Python before, you've probably seen #!/usr/bin/env python2.7 in a script. With bash, it's no different. Let's go ahead and create a new file named hello_world.sh and enter the following:

#!/bin/bash
# A function to greet everyone
greet_everyone...

Advanced shell scripting magic

This is the dark magic section of the chapter. It will demonstrate advanced shell scripting by taking the preceding lessons and features, and converting them into what could be considered a small program.

Here be dragons, ye be warned

A simple piece of introductory code is great to get a feel for the flavor of a language, but we're going to introduce some dark magic in the form of some complex utility functions that can be helpful in everyday situations. We'll use a lineinfile function to insert arbitrary text into a file—it's not a full-featured application, just enough to help ensure some simple text is injected into a file. The second function, ncz, leverages bash IP...

From dumb Terminal to glam Terminal

We're going to use gnuplot to render dumb text graphics and canvas-style plots inside our Terminal. To begin, we need some basic configuration for our gnuplot startup. Put the following in ~/.gnuplot:

set term dumb

Next, we need a wrapper around gnuplot to get some fancy graphical output. This wrapper looks at the GNUTERM environment variable of your current shell and does some calculations on the Terminal's width and height so that gnuplot knows how big a window it has. The wrapper will update our ~/.gnuplot configuration with the graphics capabilities specified for our Terminal. We aren't going to delve into the wrapper, but just use it as another command. Here it is:

__gnuplot() {
SIZE=$(stty size 2>/dev/null)
SIZE=${SIZE:-$(tput lines) $(tput cols)}
COLS=${SIZE#* }
ROWS=${SIZE% *}
XPX=${XPX:-13}
YPX=${YPX:-24}
COLUMNS=...

Summary

With the ability to reuse bash code, a collection of scripts can be cobbled together to enhance your command-line productivity. And with the ability to visualize results, you can peer into datasets and perform data mining tasks more quickly.

In the next chapter, we'll dig deeper into bash control flow to create richer functions.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Data Science with the Command Line
Published in: Jan 2019Publisher: PacktISBN-13: 9781789132984
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Jason Morris

Jason Morris is a systems and research engineer with over 19 years of experience in system architecture, research engineering, and large data analysis. His primary focus is machine learning with TensorFlow, CUDA, and Apache Spark. Jason is also a speaker and a consultant for designing large-scale architectures, implementing best security practices on the cloud, creating near real-time image detection analytics with deep learning, and developing serverless architectures to aid in ETL. His most recent roles include solution architect, big data engineer, big data specialist, and instructor at Amazon Web Services. He is currently the Chief Technology Officer of Next Rev Technologies and his favorite command line program is netcat
Read more about Jason Morris

author image
Chris McCubbin

Chris McCubbin is a data scientist and software developer with 20 years experience in developing complex systems and analytics. He co-founded the successful big data security startup Sqrrl, since acquired by Amazon. He has also developed smart swarming systems for drones, social network analysis systems in MapReduce and big data security analytic platforms using the Apache projects Accumulo and Spark. He has been using the Unix command line starting on IRIX platforms in college and his favorite command line program is find.
Read more about Chris McCubbin

author image
Raymond Page

Raymond Page is a computer engineer specializing in site reliability. His experience with embedded development engendered a passion for removing the pervasive bloat from web technologies and cloud computing. His favorite command is cat.
Read more about Raymond Page