17

When is it better to pass data to a function in a parameter, and when is it better for the function to just fetch the data itself?

Here's some simplified examples in PowerShell:

Option 1 (give the function what it needs)

function Prompt-LicenseToGrant($availableLicenses)
{
    $selection = Prompt-Menu $availableLicenses
    return $selection
}

$availableLicenses = Get-AvailableLicenses
$licenseToGrant = Prompt-LicenseToGrant $availableLicenses
Grant-License -User $user -License $licenseToGrant

function Prompt-LicenseToRevoke($assignedLicenses)
{
    $selection = Prompt-Menu $assignedLicenses
    return $selection
}

$assignedLicenses = Get-UserLicenses $user
$licenseToRevoke = Prompt-LicenseToRevoke $assignedLicenses
Revoke-License -User $user -License $licenseToRevoke

Option 2 (function just gets what it needs)

function Prompt-LicenseToGrant
{
    $availableLicenses = Get-AvailableLicenses
    $selection = Prompt-Menu $availableLicenses
    return $selection
}

$licenseToGrant = Prompt-LicenseToGrant
Grant-License -User $user -License $licenseToGrant

function Prompt-LicenseToRevoke($user)
{
    $assignedLicenses = Get-UserLicenses $user
    $selection = Prompt-Menu $assignedLicenses
    return $selection
}

$licenseToRevoke = Prompt-LicenseToRevoke $assignedLicenses
Revoke-License -User $user -License $licenseToRevoke

Edit: Some people seem a bit confused about "Prompt-Menu". Just ignore that and pretend that Prompt-LicenseToGrant and Prompt-LicenseToRevoke have their own unique implementation of a menu for the user to select from.

8 Answers 8

42

When should a function be given an argument vs getting the data itself?

When it has a job besides getting the data.

I can see why you'd be confused about this. None of your functions are named after what they are for. They're named after what you plan to do next. This is a very strange naming style that I don't think is serving you well. It's encouraging you to duplicate code pointlessly.

Consider a style where, rather then telling the caller what to do next, you're telling them what to expect.

$availableLicenses = Get-AvailableLicenses 
$license = Prompt-Menu $availableLicenses
Grant-License -User $user -License $license
Revoke-License -User $user -License $license

Here we see functions so named. They aren't ours but they don't tell us what to do next. Just what we're going to get out of them.

What's missing from your question is anything for your custom function to do. We really need that to be able to properly answer this question.

So let me make up something for it to do. Lets say only members of the dev group should be granted licenses. That way there is some business logic to care about here.

function GrantLicenseToMember($username, $group, $license) 
{
  $Authorizatized = Get-ADGroupMember -Identity $group | Where-Object {$_.name -eq $Username}
  if ($Authorizatized){ 
      Grant-License -User $username -License $license
  }
}

Notice what this function doesn't know. It doesn't know how to get the user. It doesn't know how to get the group. It doesn't know how to get the license. You have to tell it. It's because it doesn't know that it can be reused when any of that changes.

Now something, somewhere, has to know that stuff. Or none of this works.

function GrantSelectedDevLicense 
{
  $Credentials = Get-Credential
  $Username = $Credentials.UserName
  $group = 'Dev'
  $availableLicenses = Get-AvailableLicenses
  $selection = Prompt-Menu $availableLicenses
  GrantLicenseToMember $Username $group $selection
}

This can be called with no arguments. Which means it needed a very particular name because it resolves the dependencies in a very particular way. There can be many such functions that call GrantLicenseToMember. Ones that care about different groups. They'll all need different names.

The difference is GrantSelectedDevLicense focuses on it's job of knowing where to get all the needed info. GrantLicenseToMember focuses on it's job of knowing when to grant and not grant the license.

There is a pattern that describes this separation. It's named Functional Core Imperative Shell.

By making this separation you are free to keep an area of your code reserved for pure functions that are easy to test and reason about. Functions that are unaware of the world around them and so don't care when it changes. This lets you express your business logic as simply as possible.

It also gives you a business logic free space to dump all the other noise. That noise is not as easy to test. But, without the business logic, it doesn't need much testing.

Fowler was onto a similar idea when he introduced us to the Humble Object.

It all comes at the cost of having to think up these names. Don't do it without a good name. A bad name will make me wish you hadn't created any new functions at all.

1
  • 3
    Thanks, I like the idea of functional core + imperative shell. I may have oversimplified my examples. In reality there is no such function called Prompt-Menu. Each of the functions Prompt-LicenseToGrant and Prompt-LicenseToRevoke have their own way of creating a menu because the data is processed and displayed a bit differently in each one. I use the verb prompt to mean that It's prompting the user for some information. (Interactive console app.) Grant-License is a simple wrapper that grants a license to 1 user and has some user feedback and try-catch.
    – Nova
    Commented Jun 23 at 15:45
10

As usual, there are various influences on whether to do one or the other.

  • Is it likely that you will at some point have to do the same processing on different data (or some other processing on the same data)? Then you should probably pass them as an argument.

  • Does the function need the data in a specific format that isn't meaningful elsewhere? That is a point in favor of obtaining the data within the function.

  • Does getting the data take a lot of LOC compared to the actual computation? Then it should probably go into its own helper method (but that method could still be called from within the function or without).

  • If data loading is in its own helper method and data processing isn't, then the data loading routine should probably be called from outside the method, so that the code inside the function remains on the same level of abstraction.

2
  • Abstraction level match is on point. Please consider adding storage format change frequency to enhance that - if its changes are unrelated to processing, they belong to different abstractions. (Curse you Uncle Bob, I can't even speak in my own words anymore!)
    – Basilevs
    Commented Jun 23 at 10:52
  • Thank you. Need to keep the reusability and levels of abstraction in mind.
    – Nova
    Commented Jun 23 at 15:54
2

There can be a lot of reasons to chose the one or the other. The main drivers are:

  • Separation of concerns: When the function gets the data itself, it constraints itself to one single way of getting the data (e.g. calling another function) and just in time (data is from now). This prevents to reuse the function with different sources, does not allow alternate states of data (e.g. simulations), and makes maintenance difficult if access to the sources would evolve. Depending on how complex it is to get the data, it might need digging into details at a lower level of abstraction and doing more than one thing (clean code).
  • Constraints: When the functions seeks data by itself, it might require lengthy computation at the wrong moment, or overhead such as requiring the source to cache the data (not always possible).

The plug&play approach, i.e. letting the context chose the source and assemble the details by calling functions with parameters (doesn't need to be the data: can also be the injection of the function to be used to find the data) is quite robust in this regard. But if none of theses matters are relevant in you case, you have a real choice.

3
  • Dreaded cohesion and coupling - smart and sensible, but have a requirement of known future.
    – Basilevs
    Commented Jun 23 at 10:56
  • @Basilevs I understand that I need to clarify. Thanks for the feedback.
    – Christophe
    Commented Jun 23 at 11:53
  • 1
    Thanks, I think the overall sentiment is the functions will be more reusable and testable when passed arguments, and the data may want to live outside of the function for other purposes, and the function should only be at a single level of abstraction.
    – Nova
    Commented Jun 23 at 15:50
2

The problem with simplified examples is that, being simple and straightforward, they don't press against the relevant sensory and cognitive limitations that we're trying to deal with by folding things into functions.

There's nothing in the example to justify why the whole thing wouldn't be written as one top-level block of code, with no use of functions whatsoever.

It's left to the expert reader to imagine a much more complicated example from their own experience - an example where it would no longer be sufficiently manageable to put everything in a single (and overly big) procedure.

One principle

I won't be able to be comprehensive about all the principles of how to structure programs well, but in my view the first principle is that, in a procedure which interacts with a user, you usually want user interaction coordinated at the top level.

This is for two reasons. Firstly, you want to be able to test the operation of the non-interactive parts with test arguments already given or with certain user inputs already assumed, so you don't want interactive code buried and intermingled with non-interactive code, otherwise every test run will require you to interact to get from one end of the test to the other (making manual tests tedious and thus deterring sensible testing to be done, and making automatic tests extremely difficult if not impossible).

Secondly, user interaction stages often have a cancellation, exit, or move backwards option, and this is very often supposed to have an effect on the top-level program flow. If you bury the interactive code, then difficulty and clutter is caused bubbling these cases back up to the top.

So the distinction between programmed processing on the one hand, and prompting/waiting for user input on the other, is sufficiently fundamental in my experience that you don't bury the two together.

And you typically try to avoid, as much as possible, alternating between user interaction and automatic processing. If you can get everything in one go from the user, then generally you do, and if you can give everything back in one go, then generally you do.

So it's typically not a new challenge to separate out interactive code, because it reflects a broader ergonomic principle of how software should be designed to interact with users and how users will find it most convenient to interact.

Another principle

The second principle is that reads should not generally be buried with writes (except writes done purely for logging and tracing purposes). This is basically "command-query separation". It should be made obvious at the top level of your method when things have moved from gathering data or planning changes, to storing data or making changes.

Again, there is likely to be a problem with any kind of testing when the two things are buried together. The results of the read from source cannot be performed without making changes, and the write cannot be tested with fixed test values.

But more importantly, it just makes the program considerably more difficult for the programmer to read the code and to analyse where data is coming and going overall, if there are deeply buried processes that both read and write with no sign of such a flow occurring at the top level.

Conclusion

In terms of your question about whether functions are given what they need or get it for themselves, you'd actually first need to divide between user-interactive and non-interactive parts of the program, and then also divide between parts which read and parts which write.

Obviously then, the parts which read must get things for themselves - that is their purpose. And the parts which write should always be given what they need, and what they need to complete that write should already be available to be given.

There are only some minor exceptions where reads might also write - as for logging - and where writes might also read - as with reading directly from the clock for timestamps.

It may also be acceptable to mingle reads and writes where the sequence is a write then a read. It is very acceptable when something is output to the user and their input is taken immediately in response to what has been put. And it is often acceptable when the write is the substantial action, and some kind of token or acknowledgement is returned.

These latter cases of write-then-read don't typically cause application-specific data flows to be hidden, or cause testing to be frustrated, though there could still be pathological cases where the marriage was inappropriate and the answer would be to split the two.

1
  • Good points about command-query separation and interaction-operation (UI and logic) separation.
    – Nova
    Commented Jun 23 at 17:52
1

It is a good practice to separate your core/business logic from IO, parsing, serializing, deserializing and other technical staffs. Then you can write unit tests against the core logic only and verify their correctness using various kinds of test data. And that correctness will be continuously ensured against future changes.

But you can choose to not do the separation if it is a short-term one-off project, which may save some time.

0

If performance is not a concern, which sure is not in 99 percent of cases, then you should just strive for writing maintainable code.

There is, I dare to say, only one rule to follow when writing maintainable code: Separate whats likely to change from whats not likely to change.

Whats Maintainable Code

Code maintenance is about handling changes in requirements. An easily maintainable code is one where you have to change only a small part of code when requirements change, not a large part or the entirety of code.

How To Write Maintainable Code

The way to achieve this is have code divided into small parts, each must be changed for one type of changes in requirements.

To Fragment Or Not To Fragment

If what you do with data ALWAYS depends on the source of data then its wise for the function to get its data and process the data also. Its one whole unit. If source change then underlying processing also change. If only underlying processing change then since its unique to the data source its meant to be coupled with it.

If processing dont care where the data comes from then there have to be two functions: one to get the data, and one to do the processing. You will have may be five sources of data and one processing, so six functions in all. You can also have say two functions for processing with the five functions to get the data. When requirements change you switch function calls (or write new functions).

Summary (Dont unnecessarily fragment your code but do fragment when it makes code more maintainable)

Whats coupled by requirements - either always change when the other change or change only for the other - has to be in one function.

What can be changed - as in makes sense to be changed within the context of the system - while other parts need not be changed is by nature decoupled from the other parts. Its therefore a switchable part and must be in its own function.

1
  • Well said about isolating/separating out the parts that may change.
    – Nova
    Commented Jun 24 at 23:36
0

From "The Clean Code":

Statements in a function should be the same level of abstraction

If you follow this rules, your functions will either be functions that do one low-level thing or functions which orchestrate some kind of flow: e.g. loops, if-statements, combining functions by putting the result of one into another.

Applied to your case this would result in code similar to candied_orange's answer.

Get-Available-Licenses is a lower level building block and should be called in a function which combines multiple functions of this level like Grant-License.

-1

There are plenty of situations where the called function cannot know certain data.

Let’s say I have a function returning my income tax. The tax depends on my circumstances and my income. So you think the function could know all these. But I want to know “how much would my income tax increase if I got a $1,000 or $5,000 raise? Or if I got married?“ and that function and you are stuck.

And if you pass a function or object supplying the numbers needed, then you still have exactly the same problem.

Not the answer you're looking for? Browse other questions tagged or ask your own question.