If you have two synchronous transformation components and the input of the second is connected to the output of the first, does the first transformation process (loop through) all rows in the buffer before outputting these rows to the second transformation? Or does the first transformation output each individual row to the second transormation as soon as it has finished processing it?
Thanks in advance,
Lawrie.
Parts:
Component A (CA), Component B (CB), Row 1 (R1), Row 2 (R2), Row 3 (R3)
Example Synchronous:
CA and CB are Synchronous transforms (as defined by something like: output.SynchronousInputID = Input.ID in the ProvideComponentProperties()) (http://msdn2.microsoft.com/en-us/library/ms136027.aspx). The package has the row buffer size set to 1. The data source has 3 rows. The package starts and the following happens.
R1 gets to CA from upstream
CA's ProcessInput is called with R1
CA's ProcessInput finishes and R1 is passed downstream
R1 gets to CB from upstream
CB's ProcessInput is called with R1
CB's ProcessInput finishes and R1 is passed downstream
R2 gets to CA from upstream
CA's ProcessInput is called with R2
CA's ProcessInput finishes and R2 is passed downstream
R2 gets to CB from upstream
CB's ProcessInput is called with R2
CB's ProcessInput finishes and R2 is passed downstream
R3 gets to CA from upstream
CA's ProcessInput is called with R3
CA's ProcessInput finishes and R3 is passed downstream
R3 gets to CB from upstream
CB's ProcessInput is called with R3
CB's ProcessInput finishes and R3 is passed downstream
Example Asynchronous:
CA and CB are Asynchronous transforms (as defined by something like: output.SynchronousInputID = 0 in the ProvideComponentProperties()) (http://msdn2.microsoft.com/en-us/library/ms135931.aspx). The package has the row buffer size set to 1. The data source has 3 rows. The package starts and the following happens.
R1 gets to CA from upstream
CA's ProcessInput is called with R1
CA stores R1
CA's ProcessInput finishes
R2 gets to CA from upstream
CA's ProcessInput is called with R2
CA stores R2
CA's ProcessInput finishes
R3 gets to CA from upstream
CA's ProcessInput is called with R3
CA stores R3
CA loops through stored rows and calls AddRow() on the output buffer and passes the data from the stored row to the new row
CA's ProcessInput finishes
R1-3 are passed downstream
R1 gets to CB from upstream
CB's ProcessInput is called with R1
CB stores R1
CB's ProcessInput finishes
R2 gets to CB from upstream
CB's ProcessInput is called with R2
CB stores R2
CB's ProcessInput finishes
R3 gets to CB from upstream
CB's ProcessInput is called with R3
CB stores R3
CB loops through stored rows and calls AddRow() on the output buffer and passes the data from the stored row to the new row
CB's ProcessInput finishes
R1-3 are passed downstream
OR
R1 gets to CA from upstream
CA's ProcessInput is called with R1
CA calls AddRow() on the output buffer and passes the data from R1 to the new row
CA's ProcessInput finishes and R1 is passed downstream
R1 gets to CB from upstream
CB's ProcessInput is called with R1
CB calls AddRow() on the output buffer and passes the data from R1 to the new row
CB's ProcessInput finishes and R1 is passed downstream
R2 gets to CA from upstream
CA's ProcessInput is called with R2
CA calls AddRow() on the output buffer and passes the data from R2 to the new row
CA's ProcessInput finishes and R2 is passed downstream
R2 gets to CB from upstream
CB's ProcessInput is called with R2
CB calls AddRow() on the output buffer and passes the data from R2 to the new row
CB's ProcessInput finishes and R2 is passed downstream
R3 gets to CA from upstream
CA's ProcessInput is called with R3
CA calls AddRow() on the output buffer and passes the data from R3 to the new row
CA's ProcessInput finishes and R3 is passed downstream
R3 gets to CB from upstream
CB's ProcessInput is called with R3
CB calls AddRow() on the output buffer and passes the data from R3 to the new row
CB's ProcessInput finishes and R3 is passed downstream
The point is with Asynchronous transforms is that the component must call AddRow() on the output buffer and passe the data from input buffer row to the new row for it to be passed downstream. As soon as ProcessInput finishes, any rows added to the output buffer are passed downstream. You may need to store all rows or just some and the base classes allow you to pass on records whenever you wish.
Many thanks for taking the time to provide such a detailed response. The only problem is that my question was really what happens with synchronous transforms when the package has the row buffer size set greater than 1!
If you could provide an example for this I'd be really grateful...
Thanks,
Lawrie
|||
lawrieg wrote: Hi, If you have two synchronous transformation components and the input of the second is connected to the output of the first, does the first transformation process (loop through) all rows in the buffer before outputting these rows to the second transformation? Or does the first transformation output each individual row to the second transormation as soon as it has finished processing it?
Thanks in advance,
Lawrie.
The SSIS pipeline works on buffers at a time, not individual rows (unless buffer size is one).
So, the first component will pass rows to its output when its finished processing that row. But the second compoennt won't start processing until the LAST row in the buffer is passed - because then the buffer will be passed to the next component.
Does that make sense?
-Jamie
|||To expand the explanation for synchronous; change R1, R2, R3 to B1, B2, B3 where B = Buffer.
No comments:
Post a Comment