- Pentaho Data Integration Beginner's Guide(Second Edition)
- María Carina Roldán
- 839字
- 2021-07-23 15:47:01
Time for action – assigning tasks by distributing
Let's suppose you want to distribute the issues among three programmers so each of them implements a subset of the new features.
- Open the transformation created in the previous section, change the description, and save it under a different name.
- Now delete all the steps after the Sort rows step.
- Change the Filter rows step to keep only the unassigned issues:
Assignee
field equal to the stringUnassigned
. The condition looks like this: - From the Transform category of steps, drag an Add sequence step to the canvas and create a hop from the Sort rows step to this new step.
- Double-click on the Add sequence step and replace the content of the Name of value textbox with
nr
. - Drag to the canvas three Microsoft Excel Output steps.
- Link the Add sequence step to one of these steps.
- Configure the Microsoft Excel Output step to send the fields
nr
,Priority
, andSummary
to an Excel file namedf_costa.xls
(the name of one of the programmers). The Fields tab should look like this: - Create a hop from the Add sequence step to the second Microsoft Excel Output step. When asked to decide between Copy and Distribute, select Distribute.
- Configure the step as before, but name the file as
b_bouchard.xls
(the second programmer). - Create a hop from the Add sequence step to the last Microsoft Excel Output step.
- Configure this last step as before, but name the file as
a_mercier.xls
(the last programmer). - The transformation should look like this:
- Run the transformation and look at the execution tab window to see what happened. If you don't remember the meaning of the different metrics, you can go back and take a look at The Step Metrics tab section in Chapter 2, Getting Started with Transformations.
Note
Again, take into account that your numbers may not match the exact metrics shown here, as you derived your own source data from the JIRA system.
- To see which rows were to which of the created files, open any of them. It should look like this:
What just happened?
You distributed the issues among three programmers.
In the execution window, you could see that 401 rows leave the Add sequence step, and a third part of those rows arrive to each of the Microsoft Excel Output steps. In numbers, 134, 134, and 133 rows go to each file respectively. You verified that when you explored the Excel files.
In the transformation, you added an Add sequence step that did nothing more than add a sequential number to the rows. That sequence helps you recognize that one out of every three rows went to every file.
Here you saw a practical example for the Distribute option. When you distribute, the destination steps receive the rows in turns. For example, if you have three target steps, the first row goes to the first target step, the second row goes to the second step, the third row goes to the third step, the fourth row goes to the first step, and so on.
As you can see, when distributing, the hops leaving the steps from which you distribute are plain; they don't change its look and feel.
Despite the fact that this example clearly showed how the Distribute method works, this is not how you will regularly use this option. The Distribute option is mainly used for performance reasons. Throughout the book, you will always use the Copy option. To avoid being asked for the action to take every time you create more than one hop leaving a step, you can set the Copy option as default. You do it by opening the PDI options window (Tools | Options… from the main menu) and unchecking the option Show "copy or distribute" dialog?. Remember that to see the change applied you will have to restart Spoon.
Once you have changed this option, the default method is Copy rows. If you want to distribute rows, you can change the action by right-clicking on the step from which you want to copy or distribute, selecting Data Movement... in the contextual menu that appears, and then selecting the desired option.

Pop quiz – understanding the difference between copying and distributing
Look at the following transformations:

Q1. If you do a preview on the steps named Preview, which of the following is true?
- The number of rows you see in (a) is greater than or equal to the number of rows you see in (b).
- The number of rows you see in (b) is greater than or equal to the number of rows you see in (a).
- The dataset you see in (a) is exactly the same you see in (b) no matter what data you have in the Excel file.
Tip
You can create a transformation and test each option to check the results for yourself. To be sure you understand correctly where and when the rows take one or other way, you can preview every step in the transformation, not just the last one.
- 大數據導論:思維、技術與應用
- Julia 1.0 Programming
- Visual C# 2008開發技術詳解
- 計算機網絡應用基礎
- 機器人編程實戰
- Visual C++編程全能詞典
- Java Web整合開發全程指南
- Apache Superset Quick Start Guide
- Mastering Text Mining with R
- 玩機器人 學單片機
- Flash 8中文版全程自學手冊
- 運動控制器及數控系統的工程應用
- Practical Autodesk AutoCAD 2021 and AutoCAD LT 2021
- Hands/On Kubernetes on Azure
- iLike職場大學生就業指導:C和C++方向