5

I have four sets of data, the distribution of which I would like to represent in MATLAB in one figure. Current code is:

[n1,x1]=hist([dataset1{:}]);
[n2,x2]=hist([dataset2{:}]);
[n3,x3]=hist([dataset3{:}]);
[n4,x4]=hist([dataset4{:}]);
bar(x1,n1,'hist'); 
hold on; h1=bar(x1,n1,'hist'); set(h1,'facecolor','g')
hold on; h2=bar(x2,n2,'hist'); set(h2,'facecolor','g')
hold on; h3=bar(x3,n3,'hist'); set(h3,'facecolor','g')
hold on; h4=bar(x4,n4,'hist'); set(h4,'facecolor','g')
hold off 

My issue is that I have different sampling sizes for each group, dataset1 has an n of 69, dataset2 has an n of 23, dataset3 and dataset4 have n's of 10. So how do I normalize the distributions when representing these three groups together?

Is there some way to..for example..divide the instances in each bin by the sampling for that group?

2
  • Why not bar n1/sum(n1) instead? Otherwise, maybe histogram(x,'Normalization','probability') would be an alternative.
    – Florian
    Commented Feb 14, 2017 at 17:16
  • The n1/sum(n1) worked great, is there a way to do this with histfit? or some better/easier way to add fit lines? Commented Feb 14, 2017 at 18:47

1 Answer 1

2

You can normalize your histograms by dividing by the total number of elements:

[n1,x1] = histcounts(randn(69,1));
[n2,x2] = histcounts(randn(23,1));
[n3,x3] = histcounts(randn(10,1));
[n4,x4] = histcounts(randn(10,1));
hold on
bar(x4(1:end-1),n4./sum(n4),'histc');
bar(x3(1:end-1),n3./sum(n3),'histc');
bar(x2(1:end-1),n2./sum(n2),'histc');
bar(x1(1:end-1),n1./sum(n1),'histc');
hold off 
ax = gca;
set(ax.Children,{'FaceColor'},mat2cell(lines(4),ones(4,1),3))
set(ax.Children,{'FaceAlpha'},repmat({0.7},4,1))

However, as you can see above, you can do some more things to make your code more simple and short:

  1. You only need to hold on once.
  2. Instead of collecting all the bar handles, use the axes handle.
  3. Plot the bar in ascending order of the number of elements in the dataset, so all histograms will be clearly visible.
  4. With the axes handle set all properties at one command.

and as a side note - it's better to use histcounts.

Here is the result:

only hist


EDIT:

If you want to also plot the pdf line from histfit, then you can save it first, and then plot it normalized:

dataset = {randn(69,1),randn(23,1),randn(10,1),randn(10,1)};
fits = zeros(100,2,numel(dataset));
hold on
for k = numel(dataset):-1:1
    total = numel(dataset{k}); % for normalizing
    f = histfit(dataset{k}); % draw the histogram and fit
    % collect the curve data and normalize it:
    fits(:,:,k) = [f(2).XData; f(2).YData./total].';
    x = f(1).XData; % collect the bar positions
    n = f(1).YData; % collect the bar counts
    f.delete % delete the histogram and the fit
    bar(x,n./total,'histc'); % plot the bar
end
ax = gca; % get the axis handle
% set all color and transparency for the bars:
set(ax.Children,{'FaceColor'},mat2cell(lines(4),ones(4,1),3))
set(ax.Children,{'FaceAlpha'},repmat({0.7},4,1))
% plot all the curves:
plot(squeeze(fits(:,1,:)),squeeze(fits(:,2,:)),'LineWidth',3)
hold off

Again, there are some other improvements you can introduce to your code:

  1. Put everything in a loop to make thigs more easily changed later.
  2. Collect all the curves data to one variable so you can plot them all together very easily.

The new result is:

hist & fit

Not the answer you're looking for? Browse other questions tagged or ask your own question.