2
$\begingroup$

Suppose I have several sequences of integers. Let us call each sequence as $a$, $b$, $c$ and so on.

The six integers from $1$ to $6$ tell me the order of separation of compounds from a chromatographic (separation column with different conditions when a mixture of 6 components is introduced in it).
For example for condition $1$, $a=\{1,2,3,4,5,6\}$, condition $2$, $b= \{1,3,4,5,6,2\}$, condition $3$, $c=\{5,3,4,2,1,6\}$, and condition $4$, $d=\{6,5,4,3,2,1\}$. The integers cannot to be repeated in a given sequence and must be the same in all, only the order matters.

From the list of those sequences, if we wish to check, which sequence is most different to the other one quantitatively, pair-wise $(a,d)$, $(a,b)$, $(c,d)$ are there any better mathematical or statistical tests which focus on the order in a sequence?

I am looking for ideas, besides the dot product. Thanks.

$\endgroup$
4
  • $\begingroup$ There’s always the norm of the difference. $\endgroup$
    – Malady
    Commented Jun 22 at 4:51
  • $\begingroup$ Do you care about comparing the actual values of integers, or just that they're different? E.g., is 2 vs 1 in a given slot "more different" than 1 vs 6 in that same slot? $\endgroup$ Commented Jun 22 at 5:10
  • $\begingroup$ If the actual values in each slot don't matter, then see also math.stackexchange.com/questions/2492954/… and math.stackexchange.com/questions/1410088/… for discussion on distance measures on permutations $\endgroup$ Commented Jun 22 at 5:28
  • $\begingroup$ Yes, the actual order will matter in the sequence pair. In your comment, 2 vs 1 in a given slot "more different" than 1 vs 6 in that same slot?: Yes, that will have a chemical meaning. $\endgroup$
    – ACR
    Commented Jun 22 at 5:44

1 Answer 1

3
$\begingroup$

There are various (Dis)Similarity measures.

(A) You have given Dot Product , which is very common.
$S=\Sigma a_ib_i$

(B) Commenter Malady has indicated Vector Norm , though there are variations there.
Commonly , we have $S=\Sigma (a_i-b_i)^2$ to measure (& then compare) Similarity.
We can have $S=\Sigma |a_i-b_i|$ manhattan measure
The Power $2$ can change to higher values (or lower values)
We could convert all these to the range $(0,1)$ or $(-1,+1)$ suitably

(C) In Current Scenario , a common alternative is the Edit Distance to check Integer/text Sequences.
Eg given $a$ & $b$ , how many operations (minimum) will convert $a$ to $b$ is the Edit Distance.
Small Edit Distance (ED) indicates Similarity , large ED indicates DisSimilarity.

ED itself has variations like Levenshtein Distance & Hamming Distance & Jaro–Winkler Distance.
You can check which suits you the most , though all are good enough given the Description you have given.

Here is my Illustrative Example , which might use arbitrary rules , not necessarily matching the Edit Distance Variations listed :
$a=(1,2,3,4,5,6)$ & $b=(2,1,3,4,5,6)$ & $c=(1,2,4,3,6,5)$
Here , we can convert $a$ to $b$ when we exchange $2$ & $3$ : $ED = 1$
We can convert $a$ to $c$ with 2 exchanges $(3 \leftrightarrow 4,5 \leftrightarrow 6)$ : $ED = 2$
We can convert $b$ to $c$ with 3 exchanges : $ED = 3$

(D) There are some measures which use "Binary Operations" involving Binary Vectors , though those are essentially simplifications/variations/generalizations of Vector Norms.

(E) In Case the Edit Distance not considering the actual Integers/Characters is a concern , you can tweak it like this :
When-ever some Edit occurs , we generally increment the Edit Distance. You want to take the Integer values into account , hence the Increment should have a scale/weight like $|a_1-b_2|$ , $a_1b_2$ , $1+|a_1-b_2|$ , $a_1b_2+1$ , etc. Choose the Increment which makes sense in your Scenario.
With that , even when regular ED values between two Pairs $a,b$ & $a,c$ are Equal , your tweaked ED values might change due to the scale/weight , indicating the actual "Chemicals" getting exchanged.

$\endgroup$
2
  • $\begingroup$ Thank you Prem for the ideas, ChatGPT 4o, was suggesting Kendall's Tau Distance and Spearman's Footrule Distance, what do you think about those as well? $\endgroup$
    – ACR
    Commented Jun 22 at 5:42
  • $\begingroup$ The Suggestions I gave are what I knew beforehand , @ACR , While (E) is what I made up for your Case where it is suitable. Currently I am not aware of Kendall Tau Distance , Cayley Distance & Etc. Hence I can not make a claim on Suitability at the moment. $\endgroup$
    – Prem
    Commented Jun 22 at 6:10

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .