Tcl/Tk: Handling String Expressions

Exclusive offer: get 50% off this eBook here
Tcl/Tk 8.5 Programming Cookbook

Tcl/Tk 8.5 Programming Cookbook — Save 50%

Over 100 great recipes to effectively learn Tcl/Tk 8.5

$23.99    $12.00
by Bert Wheeler | March 2011 | Open Source

Tcl (Tool Command Language) is a scripting language originally designed for embedded system platforms. Since its creation, Tcl has grown far beyond its original design with numerous expansions and additions (such as the graphical Took Kit or Tk) to become a full-featured scripted programming language capable of creating elegant, cross-platform solutions.

This article by Bert Wheeler, author of Tcl/Tk 8.5 Programming Cookbook, explains how to create, manipulate, and manage string variables. We will cover:

  • Appending to a string
  • Formatting a string
  • Matching a regular expression within a string
  • Performing character substitution on a string
  • Parsing a string using conversion specifiers
  • Comparing strings

 

Tcl/Tk 8.5 Programming Cookbook

Tcl/Tk 8.5 Programming Cookbook

Over 100 great recipes to effectively learn Tcl/Tk 8.5

  • The quickest way to solve your problems with Tcl/Tk 8.5
  • Understand the basics and fundamentals of the Tcl/Tk 8.5 programming language
  • Learn graphical User Interface development with the Tcl/Tk 8.5 Widget set
  • Get a thorough and detailed understanding of the concepts with a real-world address book application
  • Each recipe is a carefully organized sequence of instructions to efficiently learn the features and capabilities of the Tcl/Tk 8.5 language
        Read more about this book      

(For more resources on TCL, see here.)

Introduction

When I first started using Tcl, everything I read or researched stressed the mantra "Everything is a string". Coming from a hard-typed coding environment, I was used to declaring variable types and in Tcl this was not needed. A set command could—and still does—create the variable and assigns the type on the fly. For example, set variable "7" and set variable 7 will both create a variable containing 7. However, with Tcl, you can still print the variable containing a numeric 7 and add 1 to the variable containing a string representation of 7.

It still holds true today that everything in Tcl is a string. When we explore the TK Toolkit and widget creation, you will rapidly see that widgets themselves have a set of string values that determine their appearance and/or behavior.

As a pre-requisite for the recipes in this article, launch the Tcl shell as appropriate for your operating system. You can access Tcl from the command line to execute the commands.

As with everything else we have seen, Tcl provides a full suite of commands to assist in handling string expressions. However due to the sheer number of commands and subsets, I won't be listing every item individually in the following section. Instead we will be creating numerous recipes and examples to explore in the following sections. A general list of the commands is as follows:

Command Description
string The string command contains multiple keywords allowing for manipulation and data gathering functions.
append Appends to a string variable.
format Format a string in the same manner as C sprint.
regexp Regular Expression matching.
regsub Performs substitution, based on Regular Expression matching.
scan Parses a string using conversion specifiers in the same manner as C sscanf.
subst Perform backslash, command, and variable substitution on a string.

Using the commands listed in the table, a developer can address all their needs as applies to strings. In the following sections, we will explore these commands as well as many subsets of the string command.

Appending to a string

Creating a string in Tcl using the set command is the starting point for all string commands. This will be the first command for most, if not all of the following recipes. As we have seen previously, entering a set variable value on the command line does this. However, to fully implement strings within a Tcl script, we need to interact with these strings from time to time, for example, with an open channel to a file or HTTP pipe. To accomplish this, we will need to read from the channel and append to the original string.

To accomplish appending to a string, Tcl provides the append command. The append command is as follows:

append variable value value value...

How to do it…

In the following example, we will create a string of comma-delimited numbers using the for control construct. Return values from the commands are provided for clarity. Enter the following command:

% set var 0
0
% for {set x 1} {$x<=10}{$x<=10} {incr x} {
append var , $x
}
%puts $var
0,1,2,3,4,5,6,7,8,9,10

How it works…

The append command accepts a named variable to contain the resulting string and a space delimited list of strings to append. As you can see, the append command accepted our variable argument and a string containing the comma. These values were used to append to original variable (containing a starting value of 0). The resulting string output with the puts command displays our newly appended variable complete with commas.

Formatting a string

Strings, as we all know, are our primary way of interacting with the end-user. Whether presented in a message box or simply directed to the Tcl shell, they need to be as fluid as possible, in the values they present. To accomplish this, Tcl provides the format command. This command allows us to format a string with variable substitution in the same manner as the ANSI C sprintf procedure. The format command is as follows:

format string argument argument argument...

The format command accepts a string containing the value to be formatted as well as % conversion specifiers. The arguments contain the values to be substituted into the final string. Each conversion specifier may contain up to six (6) sections—an XPG2 position specifier, a set of fags, minimum field width, a numeric precision specifier, size modifier, and a conversion character. The conversion specifiers are as follows:

Specifier Description
d or i For converting an integer to a signed decimal string.
u For converting an integer to an unsigned decimal string.
o For converting an integer to an unsigned octal sting.
x or X For converting an integer to an unsigned hexadecimal string.
The lowercase x is used for lowercase hexadecimal notations.
The uppercase X will contain the uppercase hexadecimal notations.
c For converting an integer to the Unicode character it represents.
s No conversion is performed.
f For converting the number provided to a signed decimal string of the form xxx.yyy, where the number of y's is determined with the precision of 6 decimal places (by default).
e or E If the uppercase E is used, it is utilized in the string in place of the lowercase e.
g or G If the exponent is less than -4 or greater than or equal to the precision, then this is used for converting the number utilized for the %e or %E; otherwise for converting in the same manner as %f.
% The % sign performs no conversion; it merely inserts a % character into the string.

There are three differences between the Tcl format and the ANSI C sprintf procedure:

  • The %p and %n conversion switches are not supported.
  • The % conversion for %c only accepts an integer value.
  • Size modifiers are ignored for formatting of floating-point values.

How to do it…

In the following example, we format a long date string for output on the command line. Return values from the commands are provided for clarity. Enter the following command:

% set month May
May
% set weekday Friday
Friday
% set day 5
5
% set extension th
th
%set year 2010
2010
%puts [format "Today is %s, %s %d%s %d" $weekday $month $day $extension
$year]
Today is Friday, May 5th 2010

How it works…

The format command successfully replaced the desired conversion fag delimited regions with the variables assigned.

Matching a regular expression within a string

Regular expressions provide us with a powerful method to locate an arbitrarily complex pattern within a string. The regexp command is similar to a Find function in a text editor. You search for a defined string for the character or the pattern of characters you are looking for and it returns a Boolean value that indicates success or failure and populates a list of optional variables with any matched strings. The -indices and -inline options must be used to modify the behavior, as indicated by this statement. But it doesn't stop there; by providing switches, you can control the behavior of regexp. The switches are as follows:

Switch Behavior
-about No actual matching is made. Instead regexp returns a list containing information about the regular expression where the first element is a subexpression count and the second is a list of property names describing various attributes about the expression.
-expanded Allows the use of expanded regular expression, wherein whitespaces and comments are ignored.
-indices Returns a list of two decimal strings, containing the indices in the string to match for the first and last characters in the range
-line Enables the newline-sensitive matching similar to passing the -linestop and -lineanchor switches.

-linestop

Changes the behavior of [^] bracket expressions and the "." character so that they stop at newline characters.
-lineanchor Changes the behavior of ^ and $ (anchors) so that they match both the beginning and end of a line.
-nocase Treats uppercase characters in the search string as lowercase.
-all Causes the command to match as many times as possible and returns the count of the matches found.
-inline Causes regexp to return a list of the data that would otherwise have been placed in match variables.
Match variables may NOT be used if -inline is specified.

 

-start Allows us to specify a character index from which searching should start.
-- Denotes the end of switches being passed to regexp.
Any argument following this switch will be treated as an expression, even if they start with a "-".

Now that we have a background in switches, let's look at the command itself:

regexp switches expression string submatchvar submatchvar...

The regexp command determines if the expression matches part or all of the string and returns a 1 if the match exists or a 0 if it is not found. If the variables (submatchvar) (for example myNumber or myData) are passed after the string, they are used as variables to store the returned submatchvar. Keep in mind that if the –inline switch has been passed, no return variables should be included in the command.

Getting ready

To complete the following example, we will need to create a Tcl script file in your working directory. Open the text editor of your choice and follow the next set of instructions.

How to do it…

A common use for regexp is to accept a string containing multiple words and to split it into its constituent parts. In the following example, we will create a string containing an IP address and assign the values to the named variables. Enter the following command:

% regexp "(\[0-9]{1,3})\.(\[0-9]{1,3})\.(\[0-9]{1,3})\.(\[0-9]{1,3})" \ 
$ip all first second third fourth
% puts "$all \n$first \n$second \n$third \n$fourth"
192.168.1.65
192
168
1
65

How it works…

As you can see, the IP Address has been split into its individual octet values. What regexp has done is match the groupings of decimal characters [0-9] of a varying length of 1 to 3 characters {1, 3} delimited by a "." character. The original IP address is assigned to the first variable (all) while the octet values are assigned to the remaining variables (first, second, third and fourth).

Performing character substitution on a string

If regexp is a Find function, then regsub is equivalent to Find and Replace. The regsub command accepts a string and using Regular Expression pattern matching, it locates and, if desired, replaces the pattern with the desired value. The syntax of regsub is similar to regexp as are the switches. However, additional control over the substitution is added. The switches are as listed next:

Switch Description
-all Causes the command to perform substitution for each match found
The & and \n sequences are handled for each substitution
-expanded Allows use of expanded regular expression wherein whitespace and comments are ignored
-line Enables newline sensitive matching similar to passing the -linestop and -lineanchor switches
-linestop Changes the behavior of [^] bracket expressions so that they stop at newline characters
-lineanchor Changes the behavior of ^ and $ (anchors) so that they match both the beginning and end of a line
-nocase Treats Upper Case characters in the search string as Lower Case
-start Allows specification of a character offset in the string from which to start matching

Now that we have a background in switches as they apply to the regsub command, let's look at the command:

regsub switches expression string substitution variable

The regsub command matches the expression against the string provided and either copies the string to the variable or returns the string if a variable is not provided. If a match is located, the portion of the string that matched is replaced by substitution. Whenever a substitution contains an & or a \0 character, it is replaced with the portion of the string that matches the expression. If the substitution contains the switch "\n" (where n represents a numeric value between 1 and 9), it is replaced with the portion of the string that matches with the nth sub-expression of the expression. Additional backslashes may be used in the substitution to prevent interpretation of the &, \0, \n, and the backslashes themselves. As both the regsub command and the Tcl interpreter perform backslash substitution, you should enclose the string in curly braces to prevent unintended substitution.

How to do it…

In the following example, we will substitute every instance of the word one, which is a word by itself, with the word three. Return values from the commands are provided for clarity. Enter the following command:

% set original "one two one two one two"
one two one two one two

% regsub -all {one} $original three new
3

% puts $new
three two three two three two

How it works…

As you can see, the value returned from the regsub command lists the number of matches found. The string original has been copied into the string new, with the substitutions completed. With the addition of additional switches, you can easily parse a lengthy string variable and perform bulk updates. I have used this to rapidly parse a large text file prior to importing data into a database.

 

Tcl/Tk 8.5 Programming Cookbook Over 100 great recipes to effectively learn Tcl/Tk 8.5
Published: February 2011
eBook Price: $23.99
Book Price: $39.99
See more
Select your format and quantity:

 

        Read more about this book      

(For more resources on TCL, see here.)

Parsing a string using conversion specifiers

To parse a string in Tcl using conversion specifiers we will be using the scan command. The scan command parses the string in a similar manner as in the ANSI C sscanf procedure. As the scan command does not accept switches such as the regexp and regsub commands, we will proceed directly to the command. The syntax of the command is as follows:

scan string format variable variable variable...

The scan command accepts a string to parse and based on the format provided, it will convert the string. If variables are provided, they will be updated to the output of the conversions.

The scan command supports the following conversion characters:

Character Description
d The input string must be a decimal integer.
o The input string must be an octal integer.
x The input string must be a hexadecimal integer.
u The input string must be a decimal integer (as in the case of d).
The output is assigned to the variable as an unsigned decimal string.
s The input substring consists of all the characters up to the next whitespace character.
e, f, or g The input substring must be a floating-point number consisting of an optional sign, a string of decimals that may or may not contain a decimal point and an optional exponentiation consisting of whether an e or E followed by an optional sign and a string of decimal digits.
The value is read and stored in the variable as a floating-point value.
[chars] The input string consists of one or more characters as listed within the brackets.
The matching string is stored in the variable.
Note that if the first character contained within the brackets is a closed bracket, it is treated as a character.
If chars contains a sequential notation of the form a-f, then any characters between a and f (a and f inclusive) will result in a match.
[^chars] The input string consists of one or more characters not listed within the brackets.
The matching string is stored in the variable.
Note that if the first character following the ^ contained within the brackets is a closed bracket, it is treated as a character.
If chars contains a sequential notation of the form a-f, then any characters between a and f (a and f inclusive) will be excluded from the match.
n No input is accepted from the input string.
Return the total number of characters scanned.

The differences between scan and the ANSI C sscanf are as follows:

  • The %p conversion specifiers are unsupported
  • For %c conversions, a single character value is converted to a decimal string
  • If the end of the input string is reached prior to any conversion having occurred and no variables were provided, an empty string is returned

How to do it…

In the following example, we will parse a hexadecimal RGB color and assign the values returned to individual variables. Return values from the commands are provided for clarity. Enter the following command:

% set color #34aa44
#34aa44

% scan $color #%2x%2x%2x r g b
% puts "$r $g $b"
52 170 68

How it works…

As you can see from the example, the scan command accepted the hexadecimal color and returned it as its decimal equivalent to the variables provided. The scan command parses the sub strings from the string provided and returns the number of conversions performed (or a -1, if the end of the string is encountered with no conversions performed). The string provides the input to be parsed, while the format instructs the command on how to parse it using the % conversion specifiers. Each variable provided will receive the output of the conversion. If no variables are provided then scan will behave in an inline mode and return the data. If no variable is provided and no conversions occur, an empty string will be returned.

All of the remaining article will deal primarily with the string command. The various options will address most of our needs where strings occur. The string command is passed to the interpreter as follows:

string option argument argument...

The string command performs one or more operations, based on the option keyword or the words provided. The arguments will contain the required input and output for the specific option used. Rather than list these en masse, I will be exploring each within the following sections.

Determining the length of a string

To determine the length of a string, Tcl provides the length keyword. The length command will return a decimal string containing the number of bytes used to represent the value contained within the variable in memory. Please note that as UTF-8 uses one to three bytes for Unicode characters; the byte length will not be the same as the character length, in most circumstances. The syntax of length is as follows:

string length variable

How to do it…

In the following example, we will determine the byte length of a string of characters. Return values from the commands are provided for clarity. Enter the following command:

% set input "The end is nigh"
The end is nigh

% string length $input
15

How it works…

As you can see in the example, the string command has read the input and returned a value of 15.

Comparing strings

In any of the programs, string comparison is critical for many reasons. To perform string comparison, Tcl provides two keywords for use with the string command—compare and equal. The syntax for the first keyword compare is as follows:

string compare -nocase -length string1 string2

When invoked with the compare keyword, the string command performs a character-by-character comparison of the strings passed in string1 and string2.

The string command accepts two switches as mentioned here:

  • -nocase
    Strings are compared in a case-insensitive manner
  • -length
    Instructs the interpreter to perform the comparison only on the first length characters

Getting ready

To complete the following example, we will need to create a Tcl script file in your working directory. Open the text editor of your choice and follow the given instructions.

How to do it…

In the following example, we will create a Tcl script to accept a string value to compare against a static value. In this method, you can see the specific returns by altering the second string. Using the editor of your choice create a text file named compare.tcl that contains the following commands:

set string1 compare
set string2 [lindex $argv 0]
set output [string compare $string1 $string2]
puts $output

After you have created the file, invoke the script with the following command line:

% tclsh85 compare.tcl compare
0

How it works…

As it can be seen, where the return value is 0, the strings are compared and match. Try invoking this script with different arguments to see the other return values. When invoked with the compare keyword, it will perform a character-by-character comparison of the two strings provided. The return values are -1, 0, or 1. These indicate if the string being compared to is lexicographically less than, equal to, or greater than the comparison string. As such, the string command will return more information on a comparison than the simple == method.

Comparing a string of characters

The second keyword for string comparison is equal.

The syntax for the string command is as follows:

string equal -nocase -length int string1 string2

When invoked with the equal keyword the string command will perform a character-by-character comparison of the two strings provided.

The equal keyword accepts two switches, as follows:

  • -nocase
    Strings are compared in a case insensitive manner
  • -length int
    Instructs the interpreter to only perform the comparison on the first length characters

How to do it…

In the following example, we will determine if the values passed as string1 and string2 are equal. Return values from the commands are provided for clarity. Enter the following command:

% string equal Monday monday
0

How it works…

As you can see, the string equal command has compared the two strings provided and found them to not be a match. When string is invoked with the equal keyword it will perform a character-by-character comparison of the two strings provided in a similar manner as the compare keyword. The difference is in the return values; equal returns a 1 if the strings are identical or a 0 if the strings do not match.

Locating the first instance of a character

In our programs, the need to find the first occurrence of a character is not uncommon. For example, we may be parsing a large text file and need to break it up into sections, based on an instance of a character. To perform this action, the string command accepts the keyword first.

The syntax for the string command is as follows:

string first varString string index

When invoked with the first keyword, the string command will search for a character or a sequence of characters in the string. If no match is found, the command returns a -1. If an index is provided, the search is constrained to the match at (or after) that index within the string.

How to do it…

In the following example, we will locate the first instance of the character a within a string. Return values from the commands are provided for clarity. Enter the following command:

% string first a 123abc123abc
3

How it works…

As you can see, string has located the first instance of the character within our string value.

Locating the index of a character

What if we need to determine which character exists at a specific location within a string and not just the first instance? To accomplish this, string, includes the index keyword.

The syntax for the string command is as follows:

string index string index

When invoked with the index keyword, the string command returns the character that exists at the location specified in the switch. The accepted values are valid for all the Tcl commands that accept an index and may be passed as follows:

Value Description
Any integer value Integer value for a specific index.
Please note that the index is 0-based.
end The last character in the string.
end-n The last character in the string minus the numeric offset represented by n.
For example, end-2 would refer to "b" in the string "abcd".
end+n The last character in the string plus the numeric offset represented by n.
A+B The character located at the index, as determined by adding the values of A and B, where A and B are integer values.
A-B The character located at the index as determined by subtracting the values of A and B where A and B are integer values.

How to do it…

In the following example we will locate the character that exists at a specific location within a string. Return values from the commands are provided for clarity. Enter the following command:

% string index abcde 3
d

How it works…

As you can see string has returned the character d, based on the index of 3. Try the various switch values to see how they react.

Summary

In this article we took a look at how to create, manipulate, and manage string variables.


Further resources on this subject:


Tcl/Tk 8.5 Programming Cookbook Over 100 great recipes to effectively learn Tcl/Tk 8.5
Published: February 2011
eBook Price: $23.99
Book Price: $39.99
See more
Select your format and quantity:

About the Author :


Bert Wheeler

After completing 20 years of military service, Bert returned to college to pursue a career in software development. He has worked in the IT industry for over 10 years in numerous roles from software development to Director of Engineering Services. He is currently employed by a Fortune-500 company as a technical resource to assist worldwide development teams in implementation of their products through various SDK packages in multiple languages and Operating Systems.

Bert has been an active contributor to the open source community in the area of Computer Visualization and developing applications with Artificial Intelligence-based learning capabilities.

Books From Packt


Tcl 8.5 Network Programming
Tcl 8.5 Network Programming

Zabbix 1.8 Network Monitoring
Zabbix 1.8 Network Monitoring

PostgreSQL 9.0 High Performance
PostgreSQL 9.0 High Performance

MySQL Admin Cookbook
MySQL Admin Cookbook

PostgreSQL 9 Admin Cookbook
PostgreSQL 9 Admin Cookbook

MySQL 5.1 Plugin Development
MySQL 5.1 Plugin Development

jQuery 1.4 Reference Guide
jQuery 1.4 Reference Guide

The 3CX IP PBX Tutorial
The 3CX IP PBX Tutorial


No votes yet

Post new comment

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
x
W
G
9
R
L
Enter the code without spaces and pay attention to upper/lower case.
Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software