4
$\begingroup$

So, I have the genotyping data of about 650,000 SNPs for 96 individuals. I already know the Y DNA haplogroup of these individuals, so to some extent, I have a gross understanding of their ancestry.

What would be the best way to go about doing this? From what I have been reading, maybe a PCA would be a good idea? Suppose individuals 1-10 belong to a particular population (unique ancestary) then their SNP genotypes should be significantly similar to each other than say individuals 11-20 who belong to another population group. So, in theory I should get two clusters in my PCA plot.

I have the genotype of each SNP for each individual. How would I go about creating a PCA plot or any other alternative way of analysis for this kind of questions.

$\endgroup$
2
  • 1
    $\begingroup$ Way better. Now one of your question is, "how to make a PCA plot". There are the plenty of tutorials out there, have you tried one? Yes? What what was the problem with it? $\endgroup$ Commented Jun 3, 2018 at 21:30
  • $\begingroup$ What is the question? $\endgroup$
    – winni2k
    Commented Jun 13, 2018 at 0:52

1 Answer 1

3
$\begingroup$

PCA sounds like a good start to see a rough population structure among your 96 individuals, this step is half a sane thing to do, but also it can give you a hint how many distinct populations there are in your dataset. I have not personally run PCA on SNP data, but it seems that there are quite lot of tutorials online. For instance this one looks nice.

Once you know how many categories you can expect in your population, you can use Structure to classify individuals to the categories.

$\endgroup$
3
  • $\begingroup$ Sorry if I wasn't able to make it clear, but ya that's precisely what I want to do. I want to see if the individuals cluster in any manner whatsoever. Did you mean PCA by any chance in your answer? So exactly how would I go about doing it. I know what PCA does in theory but I am not able to understand how I can create a PCA plot with the data I have in hand. $\endgroup$
    – user2887
    Commented Jun 3, 2018 at 15:27
  • $\begingroup$ @user2887 Oh, yes. I meant PCA, fixed that already. Note that you can also edit your question, to make it clearer or to add more details. $\endgroup$ Commented Jun 3, 2018 at 17:40
  • 1
    $\begingroup$ Hey, I edited the answer (added a link to a blogpost where they show how to do PCA on SNP data). Seems that quite lot of them out there if you google "pca genotype data". $\endgroup$ Commented Jun 3, 2018 at 17:46

Not the answer you're looking for? Browse other questions tagged or ask your own question.