this question relates to interpolation of 3D data and Improved interpolation of mostly-structured 3d data
I have an irregular 3d data set that is quite large (around 70,000 triplets---let's call it Data1
). ListPointPlot3D[Data1]
gives
Clearly, the characteristic feature of this set is that its resolution differs vastly from one direction to the other so that main information for the data lies in thirty stripes of $> 2000$ points each. I have issues with creating an interpolation for this dataset for the reasons mentioned in the other two posts so I would like to "normalise" my grid so that this becomes interpolatable.
At this stage, I'd be happy to get even linear interpolation along the non-dense direction but running Interpolation[Data1]
just crashes the Kernel.
What I thought of doing was split the strips via FindClusters
, find the one with the least elements and then reduce the elements of the rest by dropping elements (I can afford to lose some information at the edges) but I don't really know how to implement the right distance in FindClusters
.
The default Euclidean distance obviously doesn't work:
the $(x, y)$ coordinates of the set forced to split in thirty clusters gives
Take[#, 2] & /@ Data1;
FindClusters[%, 30];
ListPlot[%]
So my question is two-fold:
How to implement the distance that would cluster the data properly?
Can you think of a better way to regularise the grid so as to be able to interpolate this data set?
Here's a link to part of the data.
----EDIT----
I managed to solve my problem but I'll keep the question open in case anyone would care to improve.
For the record, the Non-Grid Interpolation Package looks really good but didn't really help here - I gave up on the Delaunay triangulation after 15' of evaluation time and the nearest neighbour method didn't look all that good.
So primarily what I did was I straightened the data set (an answer explains one way to obtain the angle) and then sorted it and split it to clusters by the $ x $-coordinate:
Rot[x_] := {{Cos[x], Sin[x], 0}, {-Sin[x], Cos[x], 0}, {0, 0, 1}};
RotData = Rot[.10795 Pi].# & /@ Data1;
RotData = SortBy[#, #[[2]] &] & /@ (RotData~GatherBy~(Round[First@#, 165]&)~SortBy~First);
(don't ask how I chose to round to 165!).I then added a missing stripe by averaging:
Mean@(RotData[[1]] + RotData[[2]]);
RotData = Insert[RotData, %, 2];
Following that, I calculated the domain where all data take values and defined a resolution and step for future reference:
lowlimit = Max@(Min /@ (Map[#[[2]] &, RotData, {2}]));
upplimit = Min@(Max /@ (Map[#[[2]] &, RotData, {2}]));
resolution = 2200;
step = (upplimit - lowlimit)/resolution;
And I finally created interpolating functions along each cluster and ran a loop to get them to evaluate to points on a regular grid. This wouldn't work if I didn't average over the x-coordinate but for this dataset this gives a very small error.
Table[Module[{TempXData, TempInterpData, t},
TempXData = Evaluate@RotData[[j]];
TempInterpData = Drop[#, 1] & /@ TempXData;
t = Interpolation[TempInterpData, Method -> "Spline"];
Table[{Mean@(First /@ TempXData), y, t[y]}, {y, lowlimit,upplimit,step}]], {j, 1, Length[RotData]}];
IntData = Flatten@%~Partition~3;
Now, IntData
is an interpolatable set:
So (in the hope this can be useful to someone OTHER than myself) if you run into a dataset you can't interpolate, you might want to think about "striping it", Interpolating the stripes and then using the interpolation to populate a regular grid.