Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

PowerShell Troubleshooting: Replacing the foreach loop with the foreach-object cmdlet

Save for later
  • 480 min read
  • 2014-11-27 00:00:00

article-image

In this article by Michael Shepard, author of PowerShell Troubleshooting Guide, we will see how to replace the foreach loop with the foreach-object cmdlet.

(For more resources related to this topic, see here.)

When you write a function to process a file, a typical approach might look like this:

function process-file{
param($filename)

   $contents=get-content $filename
   foreach($line in $contents){
       # do something interesting
   }
}

This pattern works well for small files, but for really large files this kind of processing will perform very badly and possibly crash with an out of memory exception. For instance, running this function against a 500 MB text file on my laptop took over five seconds despite the fact that the loop doesn't actually do anything. To determine the time it takes to run, we can use the measure-command cmdlet, as shown in the following screenshot:

powershell-troubleshooting-replacing-foreach-loop-foreach-object-cmdlet-img-0

Note that the result is a Timespan object and the TotalSeconds object has the value we are looking for. You might not have any large files handy, so I wrote the following quick function to create large text files that are approximately the size you ask for:

function new-bigfile{
param([string]$path,
     [int]$sizeInMB)
   if(test-path $path){
     remove-item $path
   }
   new-item -ItemType File -Path $path | out-null
   $line='A'*78
   $page="$line`r`n"*1280000
   1..($sizeInMB/100) | foreach {$page | out-file $path -Append
   -Encoding ascii}
}

The code works by creating a large string using string multiplication, which can be handy in situations like this. It then writes the string to the file the appropriate number of times that are necessary. The files come out pretty close to the requested size if the size is over 100 MB, but they are not exact. Fortunately, we aren't really concerned about the exact size, but rather just that the files are very large.

A better approach would be to utilize the streaming functionality of the pipeline and use the ForEach-Object cmdlet instead of reading the contents into a variable. Since objects are output from Get-Content as they are being read, processing them one at a time allows us to process the file without ever reading it all into memory at one time. An example that is similar to the previous code is this:

function process-file2{
param($filename)
   get-content $filename | foreach-object{
       $line=$_
       # do something interesting
   }
}

Note that since we're using the ForEach-Object cmdlet instead of the foreach loop we have to use the $_ automatic variable to refer to the current object. By assigning that immediately to a variable, we can use exactly the same code as we would have in the foreach loop example (in place of the #do something interesting comment). In PowerShell Version 4.0, we could use the –PipelineVariable common parameter to simplify this code. As with all parameters where you supply the name of a variable, you don't use the dollar sign:

function process-file3{
param($filename)
   get-content $filename -PipelineVariable line | foreach-object{
       # do something interesting
   }
}

With either of these constructions, I have been able to process files of any length without any noticeable memory usage. One way to measure memory usage (without simply watching the process monitor) is to use the Get-Process cmdlet to find the current process and report on the WorkingSet64 property. It is important to use the 64-bit version rather than the WorkingSet property or its alias: WS. A function to get the current shell's memory usage looks like this:

function get-shellmemory{
   (get-process -id $pid| select -expand WorkingSet64)/1MB
}
new-alias mem get-shellmemory

I've included an alias (mem) for this function to make it quicker to call on the command line. I try to avoid using aliases in scripts as a practice because they can make code harder to understand, but for command line use, aliases really are a time-saver. Here's an example of using get-shellmemory via its alias, mem:

powershell-troubleshooting-replacing-foreach-loop-foreach-object-cmdlet-img-1

This shows that although the function processed a 500 MB file, it only used a little over 3 MB of memory in doing so. Combining the function to determine memory usage with measure-command gives us a general purpose function to measure time and memory usage:

function get-performance{
param([scriptblock]$block);
   $pre_mem=get-shellmemory
   $elapsedTime=measure-command -Expression $block
   $post_mem=get-shellmemory
   write-output "the process took $($elapsedTime.TotalSeconds) seconds"
   write-output "the process used $($post_mem - $pre_mem) megabytes of
   memory"
}
new-alias perf get-performance

One thing to note about measuring memory this way is that since the PowerShell host is a .NET process that is garbage-collected, it is possible that a garbage-collection operation has occurred during the time the process is running. If that happens, the process may end up using less memory than it was when it started. Because of this, memory usage statistics are only guidelines, not absolute indicators. Adding an

explicit call to the garbage collector to tell it to collect will make it less likely that the memory readings will be unusual, but the situation is in the hands of the .NET framework, not ours.

You will find that the memory used by a particular function will vary quite a bit, but the general performance characteristics are the important thing. In this section, we're concerned about whether the memory usage grows proportionally with the size of the input file. Using the first version of the code that used the foreach loop, the memory use did grow with the size of the input file, which limits the usefulness of that technique.

For reference, a summary of the performance on my computer using the foreach loop and the ForEach-Object cmdlet is given in the following table:

Input size

Loop time

Loop memory

Cmdlet time

Cmdlet memory

100 MB

1.1s

158 MB

1.5s

1.5 MB

500 MB

6.1s

979 MB

8.7s

12.9 MB

1 GB

38.5s

1987 MB

16.7s

7.4 MB

2 GB

Failed

 

51.2s

Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime

8.6 MB

4 GB

Failed

 

132s

12.7 MB

While these specific numbers are highly dependent on the specific hardware and software configuration on my computer, the takeaway is that by using the ForEach-Object cmdlet you can avoid the high memory usage that is involved in reading large files into memory.

Although the discussion here has been around the get-content cmdlet, the same is true about any cmdlet that returns objects in a streaming fashion. For example, Import-CSV can have exactly the same performance characteristics as Get-Content. The following code is a typical approach to reading CSV files, which works very well for small files:

function process-CSVfile{
param($filename)
   $objects=import-CSV $filename
   foreach($object in $objects){
       # do something interesting
   }
}

To see the performance, we will need some large CSV files to work with. Here's a simple function that creates CSV files with approximately the right size that will be appropriate to test. Note that the multipliers used in the function were determined using trial and error, but they give a reasonable 10-column CSV file that is close to the requested size:

function new-bigCSVfile{
param([string]$path,
     [int]$sizeInMB)
   if(test-path $path){
     remove-item $path
   }
   new-item -ItemType File -Path $path | out-null
   $header="Column1"
   2..10 | foreach {$header+=",Column$_"}
   $header+="`r`n"
   $header | out-file $path -encoding Ascii
   $page=$header*12500
   1..($sizeInMB) | foreach {$page | out-file $path -
     Append -Encoding ascii}
}

Rewriting the process-CSVfile function to use the streaming property of the pipeline looks similar to the rewritten get-content example, as follows:

function process-CSVfile2{
param($filename)
   import-CSV $filename |
       foreach-object -pipelinevariable object{
       # do something interesting
       }
}

Now that we have the Get-Performance function, we can easily construct a table of results for the two implementations:

Input size

Loop time

Loop memory

Cmdlet time

Cmdlet memory

10 MB

9.4s

278 MB

20.9s

4.1 MB

50 MB

62.4s

1335 MB

116.4s

10.3 MB

100 MB

165.5s

2529 MB

361.0s

21.5 MB

200 MB

Failed

 

761.8s

25.8 MB

It's clear to see that trying to load the entire file into memory is not a scalable operation. In this case, the memory usage is even higher and the times much slower than with get-content. It would be simple to construct poorly executing examples with cmdlets such as Get-EventLog and Get-WinEvent, and replacing the foreach loop with the ForEach-Object cmdlet will have the same kinds of results in these as well. Having tools like the Get-Performance and Get-ShellMemory functions can be a great help to diagnosing memory scaling problems like this. Another thing to note is that using the pipeline is slower than using the loop, so if you know that the input file sizes are small the loop might be a better choice.

Summary

In this article we saw how to replace the foreach loop with the foreach-object cmdlet.

Resources for Article:


Further resources on this subject:


Modal Close icon
Modal Close icon