Creating a parallel scoring pipeline
Standard ML pipelines work just fine for the majority of ML use cases, but when you need to score a large amount of data at once, you need a more powerful solution. That's where ParallelRunStep comes in. ParallelRunStep is Azure's answer to scoring big data in batch. When you use ParallelRunStep, you leverage all of the cores on your compute cluster simultaneously.
Say you have a compute cluster consisting of eight Standard_DS3_v2 virtual machines. Each Standard_DS3_v2 node has four cores, so you can perform 32 parallel scoring processes at once. This parallelization essentially lets you score data many times faster than if you used a single machine. Furthermore, it can easily scale vertically (increasing the size of each virtual machine in the cluster) and horizontally (increasing the node count).
This section will allow you to become a big data scientist who can score large batches of data. Here, you will again be using simulated...