Note that I can't judge whether you can/should change how batch
works, since you never posted its implementation. This answer assumes that the batch
object cannot be changed.
If you are free to do so, first analyze if the data fetching process can be improved from the source, as these improvements will trickle down all the way to your consumer(s).
I don't really like any of the proposed options, simply because they're IMHO needlessly contrived when compared to the simple foreach
/for
, which is really what you need here.
That being said, I'm aware that you need to defer the fetching of all of the data beforehand, so a simple for
wouldn't work here, and a foreach
would still require some underlying complexity. My main point here is that you need to compare your presented options to the most natural implementation (which is for
/foreach
here) and then pick the one that minimizes the added complexity.
Option A suffers from a likely bug where you don't process the last item. This is under the assumption that batch.HasMoreData
returns true only if there is data that you still have not fetched. This means that when you fetch the last data, then check batch.HasMoreData
, you'll exit the loop and not process the last entry.
Option B is liable to crash when dealing with an empty batch list, as you call Process
before doing any kind of sanity check.
Option C looks like its the attempt to implement a sanity check by adding a no-op EmptyBatch
that avoids an exception, instead leading to a Process
step that does nothing.
While this may work, it is very patchworky and leads to needlessly obfuscated code. It might also be passing the "null buck" by forcing the Process
logic to specifically check for the empty object and e.g. not log anything; which is precisely why I'm not a fan of this approach when it can be avoided.
If I had to pick between these options, I would alter option A to do:
MyBatch batch = null;
while (batch == null || batch.HasMoreData)
{
batch ??= GetBatchedData(batchIndex++);
Process(batch);
}
This gets the job done without issues, assuming your logic can elegantly handle a first fetch if there is no data to fetch at all.
I am somewhat annoyed at having to both include the "did not fetch anything yet" and "did fetch previous batch and there is more to fetch" logic at the same time.
However, under the assumption that there is no other way of checking that there is data to fetch before you've started; this way at least keeps the main processing logic neat and tidy.
Assuming that batch.HasMoreData
only yields true when the supplier of batch
contains data that has not been accessed yet, the above should work the way you need it to.
I am also assuming that fetching the first batch, when there is nothing to fetch, yields an object with an empty Items
array, not null
(for either the object or its Items
property).
However, I'm not a fan of this approach overall. I also find it grating that the consumer is the one responsible for keeping track of which index they have to request. If you're using a while
, I would innately infer that the consumer doesn't have this knowledge and instead dynamically waits until the data well runs dry at some point.
You're also trying to reinvent the enumerator; and in general I would urge you to use the tools that exist and have been tried and tested, instead of rolling your own (except for educational purposes or if you concrete intend to improve it).
Part of what leads you to avoid the more obvious for
/foreach
is that you are trying to avoid complete enumeration of all data before you even begin processing the first entry. I understand and respect that, but there are better solutions here.
Suggestion A has my preference here. yield return
was created precisely for deferred enumeration where you don't want to render an item before the consumer wishes to concretely access it.
That being said, the underlying method aggregating the yield return
ed enumerable is still going to end up having to do something along the lines of:
MyBatch batch = null;
while (batch == null || batch.HasMoreData)
{
batch ??= GetBatchedData(batchIndex++);
foreach(var item in batch.Items)
{
yield return item;
}
}
If you compare this to how I altered your option A, it does pretty much the same thing in regards to batch fetching, albeit that it returns the data rather than immediately handling it.
Which doesn't really change your approach, it hides it.
Nonetheless, this has my preference, simply because it allows you to separate the nitty gritty aggregation logic from the actual consumer. This means that you can abstract it away into a neat little method/class of its own, and any and all consumers don't have to actively account for the fact that there is deferred enumeration going on under the hood.
Another reason to like it is that it hides the entire idea of batched retrieval, instead pretending like it's returning a flat list of items. This is just one of those little intricacies that a consumer really doesn't care about, so hiding it is a good thing.
If you can't avoid the nitty gritty, at least package it into a pretty box so that others don't have to fuss with it.
Batch
simply implementIEnumerable<>
? All of these design decisions have already been made 20 years ago for you.IEnumerable
should be the default, but more complex solutions might be motivated when dealing with large amount of data, concurrency etc.IEnumerable
?Stream
orSpan<byte>
for in-memory data. But it really depends on the type of data, for things like audio or video, efficient handling of buffers can be critical, but in many other cases the overhead of calling a method is irrelevant. Manual batching like that might be relevant when fetching data from some web API.