13
$\begingroup$

I am trying to remove outliers from the following data:

Data={{0.105, 0.989213}, {0.106414, 0.988926}, {0.107828, 
  0.988636}, {0.109242, 0.988343}, {0.110657, 0.988049}, {0.112071, 
  0.987748}, {0.113485, 0.}, {0.114899, 1.}, {0.116313, 
  0.986826}, {0.117727, 0.986512}, {0.119141, 0.986196}, {0.120556, 
  0.995073}, {0.12197, 0.985551}, {0.123384, 0.0154883}, {0.124798, 
  0.984894}, {0.126212, 1.}, {0.127626, 0.984222}, {0.12904, 
  0.983887}, {0.130455, 0.983538}, {0.131869, 0.983197}, {0.133283, 
  0.}, {0.134697, 0.970927}, {0.136111, 0.98213}, {0.137525, 
  0.98177}, {0.138939, 1.}, {0.140354, 0.981041}, {0.141768, 
  0.980672}, {0.143182, 0.826229}, {0.144596, 0.979923}, {0.14601, 
  0.979546}, {0.147424, 0.979163}, {0.148838, 0.978778}, {0.150253, 
  0.978392}, {0.151667, 0.978}, {0.153081, 0.977605}, {0.154495, 
  0.977208}, {0.155909, 0.976807}, {0.157323, 0.976404}, {0.158737, 
  0.975999}, {0.160152, 0.55766}, {0.161566, 
  0.975177}, {0.16298, -0.000401533}, {0.164394, 0.974344}, {0.165808,
   1.00182}, {0.167222, 0.}, {0.168636, 0.973073}, {0.170051, 
  0.972646}, {0.171465, 0.972211}, {0.172879, 0.971787}, {0.174293, 
  0.971338}, {0.175707, 0.970898}, {0.177121, 0.970455}, {0.178535, 
  0.97001}, {0.179949, 0.96956}, {0.181364, -0.000767749}, {0.182778, 
  0.968655}, {0.184192, 0.968197}, {0.185606, 0.967738}, {0.18702, 
  0.967275}, {0.188434, 0.96681}, {0.189848, 0.966343}, {0.191263, 
  0.}, {0.192677, 0.965404}, {0.194091, 0.964925}, {0.195505, 
  0.964447}, {0.196919, 0.963967}, {0.198333, 0.963484}, {0.199747, 
  0.962999}, {0.201162, 1.}, {0.202576, 0.962022}, {0.20399, 
  0.961529}, {0.205404, 0.961034}, {0.206818, 0.960536}, {0.208232, 
  0.960036}, {0.209646, 0.959534}, {0.211061, 0.959029}, {0.212475, 
  0.958522}, {0.213889, 0.958013}, {0.215303, 1.}, {0.216717, 
  0.956987}, {0.218131, 0.956471}, {0.219545, 0.955953}, {0.22096, 
  0.955432}, {0.222374, 0.954909}, {0.223788, 0.954385}, {0.225202, 
  0.894605}, {0.226616, 0.953327}, {0.22803, 0.952796}, {0.229444, 
  0.952262}, {0.230859, 0.951726}, {0.232273, 0.951188}, {0.233687, 
  0.950648}, {0.235101, 0.950106}, {0.236515, 0.949561}, {0.237929, 
  0.949017}, {0.239343, 0.948467}, {0.240758, 0.947917}, {0.242172, 
  0.947364}, {0.243586, 0.946811}, {0.245, 0.946254}};

When I plot it, I do see a few outliers that I want to remove (as shown by the pink highlighter):

ListPlot[Data, AxesLabel -> {x, y}, PlotRange -> Full]

enter image description here

And up close there are a few more that I would like to remove from the data:

ListPlot[Data, AxesLabel -> {x, y}]

enter image description here

Is there a way I can remove the shown outliers programmatically in Mathematica? Any tips/suggestions will be much appreciated!

$\endgroup$
1
  • 1
    $\begingroup$ I used Tukey's fences from Wikipedia; it worked great for me. The advantage with Tukey's is that you can easily vary its parameter k to be more or less strict about what is an outlier. Be ye sure you know why these outliers arose; there's gold in them thar hills. $\endgroup$
    – CElliott
    Commented Aug 25, 2021 at 11:32

1 Answer 1

21
$\begingroup$

You could take a look at the built-in functions FindAnomalies and DeleteAnomalies.

We can use LearnDistribution on the MovingMedian of the data to get an idea of what data we would expect, and then use DeleteAnomalies.

ListPlot[DeleteAnomalies[
  LearnDistribution[MovingMedian[Data, 5], Method -> "Multinormal"], 
  Data], PlotRange -> Full]

enter image description here

Comparing with the original data:

newdata = 
 DeleteAnomalies[
  LearnDistribution[MovingMedian[Data, 5], Method -> "Multinormal"], 
  Data]

ListPlot[{Data, newdata}, PlotStyle -> {Red, Blue}]

enter image description here

We can see that the points in red have been removed.

$\endgroup$
5
  • $\begingroup$ This is such a great answer, thanks so much, Carl! $\endgroup$
    – TDH
    Commented Aug 22, 2021 at 10:55
  • $\begingroup$ No problem at all! I'm sure there's a more rigorous approach that you should take, but they're really helpful functions since you could train LearnDistribution on any expected data and use it the same way. Plus this is such a nice example of "Mathematica Magic" that I feel is quite exemplary of the power of the language. $\endgroup$
    – Carl Lange
    Commented Aug 22, 2021 at 14:27
  • $\begingroup$ Yes, for my purpose, this worked great! It cleaned up the data for a much larger data set very nicely. I did not know about the function LearnDistribution. It was indeed magic. Thanks so much again! $\endgroup$
    – TDH
    Commented Aug 23, 2021 at 16:56
  • $\begingroup$ With MMA V12.0 (Mac) your code fails with the error message DeleteAnomalies : Options expected instead of data. Which version of MMA have you used? $\endgroup$
    – Sigis K
    Commented Aug 31, 2021 at 13:15
  • $\begingroup$ 12.3 on Linux. According to the docs DeleteAnomalies was updated in 12.1, so maybe that's why. $\endgroup$
    – Carl Lange
    Commented Aug 31, 2021 at 13:40

Not the answer you're looking for? Browse other questions tagged or ask your own question.