29

I have a struct that holds thousands of samples of data. Each data point contains multiple objects. For example:

Structure(1).a = 7
Structure(1).b = 3
Structure(2).a = 2
Structure(2).b = 6
Structure(3).a = 1
Structure(3).b = 6
...
... (thousands more)
...
Structure(2345).a = 4
Structure(2345).b = 9

... and so on.

If I wanted to find the index number of all the '.b' objects containing the number 6, I would have expected the following function would do the trick:

find(Structure.b == 6)

... and I would expect the answer to contain '2' and '3' (for the input shown above).

However, this doesn't work. What is the correct syntax and/or could I be arranging my data in a more logical way in the first place?

4
  • 4
    As a side-note: It's much more RAM-efficient (and convenient) to have structures of arrays (i.e. a scalar structure with a 2000x1 field a) than arrays of structures, since every element of a structure comes with a few bytes of memory overhead.
    – Jonas
    Commented Jan 23, 2013 at 13:59
  • @Jonas but then you couldn't represent "empty" fields.
    – Eitan T
    Commented Jan 23, 2013 at 14:06
  • 3
    @EitanT: Just replace them with NaNs.
    – Jonas
    Commented Jan 23, 2013 at 14:23
  • @Jonas, many thanks; I have now arranged my data into a structure of arrays on your advice. Much more logical :) Only caveat is that things could misalign if one of my arrays doesn't contain the same number of elements. But, I make sure they do. Commented Jan 23, 2013 at 17:51

3 Answers 3

25

The syntax Structure.b for an array of structs gives you a comma-separated list, so you'll have to concatenate them all (for instance, using brackets []) in order to obtain a vector:

find([Structure.b] == 6)

For the input shown above, the result is as expected:

ans =
     2     3

As Jonas noted, this would work only if there are no fields containing empty matrices, because empty matrices will not be reflected in the concatenation result.

Handling structs with empty fields

If you suspect that these fields may contain empty matrices, either convert them to NaNs (if possible...) or consider using one of the safer solutions suggested by Rody.

In addition, I've thought of another interesting workaround for this using strings. We can concatenate everything into a delimited string to keep the information about empty fields, and then tokenize it back (this, in my humble opinion, is easier to be done in MATLAB than handle numerical values stored in cells).

Inspired by Jonas' comment, we can convert empty fields to NaNs like so:

str = sprintf('%f,', Structure.b)
B = textscan(str, '%f', 'delimiter', ',', 'EmptyValue', NaN)

and this allows you to apply find on the contents of B:

find(B{:} == 6)

ans =
     2
     3
3
  • 3
    Note that this will return wrong indices for fields that come after an empty one.
    – Jonas
    Commented Jan 23, 2013 at 13:43
  • @Jonas I got the impression that there are no empty fields, but you're right. I'll mention that.
    – Eitan T
    Commented Jan 23, 2013 at 13:45
  • 1
    It's worth mentioning that this also works with objects. If array is a MxN myClass array with properties: myProp, then find([array.myProp] == value) works as expected. (Didn't try with empty values) Commented Jun 25, 2018 at 13:02
9

Building on EitanT's answer with Jonas' comment, a safer way could be

>> S(1).a = 7;
   S(1).b = 3;
   S(2).a = 2;
   S(2).b = 6;
   S(3).a = 1;
   S(3).b = [];
   S(4).a = 1;
   S(4).b = 6;

>> find( cellfun(@(x)isequal(x,6),{S.b}) )
ans =
     2     4

It's probably not very fast though (compared to EitanT's version), so only use this when needed.

3
  • 1
    +1: by the way, you could use cellfun(@(x)isequal(x, 6), {S.b}) for the cellfun logic. isequal can handle empty arrays too.
    – Eitan T
    Commented Jan 23, 2013 at 14:18
  • 1
    @EitanT: btw: 10K! congrats! :) Commented Jan 23, 2013 at 14:24
  • 1
    @EitanT: Also, congrats on the silver badge!
    – Jonas
    Commented Jan 23, 2013 at 14:25
9

Another answer to this question! This time, we'll compare the performance of the following 4 methods:

  1. My original method
  2. EitanT's original method (which does not handle emtpies)
  3. EitanT's improved method using strings
  4. A new method: a simple for-loop
  5. Another new method: a vectorized, emtpy-safe version

Test code:

% Set up test
N = 1e5;

S(N).b = [];
for ii = 1:N
    S(ii).b = randi(6); end

% Rody Oldenhuis 1
tic
sol1 = find( cellfun(@(x)isequal(x,6),{S.b}) );
toc

% EitanT 1
tic
sol2 = find([S.b] == 6);
toc

% EitanT 2
tic
str = sprintf('%f,', S.b);
values = textscan(str, '%f', 'delimiter', ',', 'EmptyValue', NaN);
sol3 = find(values{:} == 6);
toc


% Rody Oldenhuis 2
tic
ids = false(N,1);
for ii = 1:N
    ids(ii) = isequal(S(ii).b, 6);
end
sol4 = find(ids);
toc

% Rody Oldenhuis 3
tic
idx = false(size(S));
SS = {S.b};
inds = ~cellfun('isempty', SS);
idx(inds) = [SS{inds}]==6;
sol5 = find(idx);
toc

% make sure they are all equal
all(sol1(:)==sol2(:))
all(sol1(:)==sol3(:))
all(sol1(:)==sol4(:))
all(sol1(:)==sol5(:))

Results on my machine at work (AMD A6-3650 APU (4 cores), 4GB RAM, Windows 7 64 bit):

Elapsed time is 28.990076 seconds. % Rody Oldenhuis 1 (cellfun)
Elapsed time is 0.119165 seconds.  % EitanT 1 (no empties)
Elapsed time is 22.430720 seconds. % EitanT 2 (string manipulation)
Elapsed time is 0.706631 seconds.  % Rody Oldenhuis 2 (loop)
Elapsed time is 0.207165 seconds.  % Rody Oldenhuis 3 (vectorized)

ans =
     1
ans =
     1
ans =
     1
ans =
     1

On my Homebox (AMD Phenom(tm) II X6 1100T (6 cores), 16GB RAM, Ubuntu64 12.10):

Elapsed time is 0.572098 seconds.  % cellfun
Elapsed time is 0.119557 seconds.  % no emtpties
Elapsed time is 0.220903 seconds.  % string manipulation
Elapsed time is 0.107345 seconds.  % loop
Elapsed time is 0.180842 seconds.  % cellfun-with-string

Gotta love that JIT :)

and wow...anyone know why the two systems behave so differently?

Also, little known fact -- cellfun with one of the possible string arguments is incredibly fast (which goes to show how much overhead anonymous functions require...).

Still, if you can be absolutely sure there are no empties, go for EitanT's original answer; that's what Matlab is for. If you can't be sure, just go for the loop.

7
  • Can you benchmark this against my original solution for a structure without empty fields?
    – Eitan T
    Commented Jan 23, 2013 at 16:50
  • @EitanT: there you go; seems that method is still unbeatable. Commented Jan 24, 2013 at 9:00
  • @EitanT: Also check out my third solution :) Commented Jan 24, 2013 at 9:11
  • Why is there a difference between cellfun('isempty', ...) and cellfun(@isempty, ...)? According to the official documentation, the string argument is accepted only for backward compatibility... and another thing, on my machine the textscan solution works x10 faster than you've shown here, so I wonder what's the reason for this inconsistency.
    – Eitan T
    Commented Jan 24, 2013 at 23:47
  • 1
    Wow. So many convoluted methods to do something that seems like it should be so simple. Thanks for all the ideas. No thanks to The Mathworks for not making it easy ;-)
    – Flyto
    Commented Jan 27, 2015 at 12:35

Not the answer you're looking for? Browse other questions tagged or ask your own question.