4

We've covered this topic before, but with the growing number of people taking DNA tests and having problems interpreting their results, I think it's time we came to a definitive conclusion before we let too many rabbits out of the hat by accident rather than deliberately, so to speak.

Related questions:

Appropriateness or Level of detail on DNA Genealogy Questions?

Can we make a section on the FAQ for DNA questions?

What are our options when dealing with personal information?

Related resources:

Jan Murphy draws our attention elsewhere to this post: A matter of consent. And that post in turn points to some working standards for genetic genealogists found at Genetic Genealogy Standards


We've always taken a firm stance not to allow the publication of PIFI (personally identifiable information) for living or potentially-living individuals. However, do we as a community understand/agree what this means for DNA questions?

I think we'd all agree that publishing raw DNA results (for youself or for anyone else) should not be allowed (anonymised or not) -- if it's somebody else's data, it's a breach of their privacy, and if it's your own data, it falls in the same category as posting your address and phone number in questions or answers -- incredibly unwise as a minimum, and unlikely to be of relevance to anybody else as part of a question anyway.

I think we'd all agree the necessity for anonymising any data that is published, removing names, email addresses and DNA kit numbers. But what about chromosome data -- is there anything more personally identifying than that...?

Even murkier is publishing extracts/summaries of data -- for example, how one (anonymised) kit triangulates with other (anonymised) kits.

On one hand, this includes personal data about probably-living individuals and their relationships that the OP may or may not have permission to publish. As we can't check that they have permission, should we remove it if it appears? And does it matter if it's already been openly published before -- although we might not know that it was published with the permission of the individual concerned. Oh, dear...

On the other hand, the anonymised data in question may not currently be traceable back to to an individual, so that would be OK, wouldn't it? Unless and until unforeseen techniques become available in future that can lead to identification. Or somebody recognised their own data when googling around. Should we err on the side of caution? Do we even know where that is?

On the gripping hand, sometimes the data might be necessary to get a useful answer to a question. Might that mean the question is too narrowly applicable to a single person? Can it be edited to be of wider utility, by illustrating some general principles and approaches, or just explaining how a particular site compares kits and displays results, so the OP and others can better understand how to intepret their own data? Is it possible to ask the question using totally-fabricated data (which of course assumes you understand the problem well enough to construct fabricated data that illustrates the reality). And isn't all that too complicated and a shed-load of work that nobody will ever bother with?

I believe we need a set of simple(ish) guidelines that makes it clear what we expect from posters to protect the privacy of individuals but enable the asking and answering of generally-applicable questions about genetic genealogy. (We probably also want to make clear that medical genealogy is off-topic, as our questions about (e.g.) do my genes mean my father was red-haired...)

I just don't know what those guidelines are however.

4 Answers 4

3

This highlights a major point of confusion.

With regard to the question that sparked all this, about GEDMatch:

GEDMatch does not reveal any raw data. It only shows matching segments: Person 1 matches Person 2 at location 34634654 on chromosome 12. Unless you have access to the raw data for Person 1 or Person 2, there is no possible way for anyone to identify the value of the genetic markers tested at that location. Users do not have access to raw data.

It's like saying my friend and I were born on the same day. We match - yay! You tell me what my birthdate is, please – you can't.

It is impossible for anyone to draw conclusions about health or diseases from this type of data, because we do not know the nature of the match, not even the markers in question, and we certainly do not know the genetic code. If anyone tells you that you will develop Alzheimer's from a GEDMatch match, you can tell them to...well I'll let you come to a diplomatic way to word it.

Posting anonymised data from GEDMatch (meaning the names and kit numbers are hidden) poses no risk of identification.

This being said, I agree that posting specific details about other people's raw genetic data on specific markers should be avoided on this site. In this day and age there is a lot of scaremongering regarding privacy, and I think it is important for us to have a sensible, balanced policy on DNA data, otherwise we may be better directing DNA questions to other sites that are able to answer them.

3
  • You may know more than me about this, but it seems to me that revealing matches between two kits is revealing personal information about relationships, especially if those are recent relationships. For example I have a (beloved) half-sister LL. She's happy for the facts to be known : she was born as a result of a war-time liaison; was raised by my father after he married my mother, adopted his surname and married under that name. If somebody published matches between their kit and the ones I manage for all my siblings, that could become clear. LL doesn't care, but there are others who might.
    – user104
    Commented Apr 12, 2017 at 11:57
  • Perhaps total anonymisation mitigates the risk, but I wouldn't be confident that it will always be so. In any case, I would be furious if anyone posted even anomymised data about my kits.
    – user104
    Commented Apr 12, 2017 at 12:02
  • Is it a spectrum, I wonder... 1 chromosome/few matches == low risk of identification. 23 chromosomes/many mathces == more chance of identification in future as analytical options increase? And if it is a spectrum, can and should we give guidance about what point acceptability ends?
    – user104
    Commented Apr 13, 2017 at 5:38
3

I want to clarify my objection to the recent question that has prompted this discussion.

Linking directly to a page of GEDMatch match results revealed both full names and kit numbers of the kits being compared. This seems to me to be an obvious breach of our own policy not to publish the names of living people without their permission. Perhaps those testers have given consent for their results to be available to other people on GEDMatch -- that doesn't mean that they've given consent for those results to be copied and shared on other sites.

In her post A Matter of Consent, Judy G. Russell says:

What this means, put in simple terms, is that we should not take a screen capture of DNA results from a testing company and post it in a blog post or on Facebook with the names or pictures of our matches still attached unless we’ve asked those matches specifically if we can post it.

The match results were linked to, so technically they weren't distributed outside of GEDMatch, but in my opinion exposing the full names of people in this way is still a breach of our own policy. From On Topic:

Please note: You must not include here in any circumstances information (including name, date and place of birth or any other details) that would allow identification of any living (or possibly) living individual by somebody reading this site. In practice, this means details about anyone born in the last 100 years, whether they are believed to be deceased or not, and whether or not they have given their permission.

Do we really want to say to someone, if you're doing paper-trail genealogy, you have to follow this restrictive policy, but if you're doing DNA research, it's okay -- as long as you're only linking to other pages that breach our policy, we'll look the other way?

Can we at the very least agree that the DNA Kit numbers fall under the description of "other details" in our own policy? Or is it primarily the combination of kit number plus the full name that is problematic?

My gut feeling is that if the professionals are obscuring names and photos and portions of kit numbers when they show tables of results, then we should do it too. Otherwise experts who are following ethical standards won't be willing to post here.

1
  • 1
    I think 'Genealogists respect all limitations on reviewing and sharing DNA test results imposed at the request of the tester. For example, genealogists do not share or otherwise reveal DNA test results (beyond the tools offered by the testing company) or other personal information (name, address, or email) without the written or oral consent of the tester' is important as well. Depending of course on the definition of 'results'.
    – user104
    Commented Apr 13, 2017 at 5:35
2

The following is my summary of my perspective of it:

  • People when they do their DNA tests have an option of setting one's privacy preferences on all sites. This can be full name view to completely hidden. The caveat / blurry part here is they are opt'n in to share their information with members of that site .. maybe not so much publically. The exception is GEDMATCH where they do agree to make the results publically available.
  • Ancestry.com and GEDMATCH allow you to assign alias names or initials to a kit vs. a direct testee name.
  • We have no control over other sites publishing of information and what users make public. I look to the phonebook example as a like reference. Phonebooks are publically available and have an opt-out clause with your service provider. If you do not opt out your name, address in some places, as well as phone number is made publically available. In many places, including where I live, my whole real estate record is also publically available with far more information about me than a phone book and it provides no opt-out.
  • A lot of DNA tests require context to be able to answer and as paper-trail / fact based genealogists we want to 'go look ourselves' if we are attempting to answer a question if the user does not provide the information.. the only site that lets you do this is GEDMATCH using kit numbers.. everyone else uses individuals and unless you have a shared match you cannot look at someone else matches or criteria.
  • On the question on whether you can personally identify someone based on shared Automsomal chromosone areas.. maybe if you had access to a massive database and your own tools..law enforcement and medical community probably.. but today no genealogy site including GEDMATCH allow you to enter a particular chromosone pattern (remember it goes beyond just overlapping segments it is the details of those segments) and see matches.. it is all about matching tools. Again it goes back to did the test taker and their matches consciencely opt-in to have their information available to members of that site..
  • On YSTR results.. posting a YSTR result is much easier to narrow someone down once beyond Y37 because a limited number of sites offer the information and the limited number of matches people have.. but only YSearch allows you to do this and users HAVE TO opt in to have their results 100% public; there is no option.

On what I think we should allow

  • Any and all screenshots of DNA 'visualizations' (chromosones, maps, numbers, etc) that do not show the name or photo of their matches related specifically to genealogy and not ethnicity.
  • Kit numbers are fine as long as they do not show the name or photo of their matches.
  • Initials of their matches are fine as long as they do not show their name or photos of their matches.
  • Ancestorial Surnames and Locations (like ftDNA & Ancestory show)

On what I think we should NOT allow

  • Persons name
  • Persons photo
  • Persons email address
  • Ethnicity maps / % questions unless it is specific to understanding something like conflicting results or historical origins or understanding the calculation itself... No "Am I black, am I Jewish, am I Native American" type questions.
  • -
9
  • 1
    Some comments: (1) Gedmatch doesn't make data publically available, only to suscribers if I'm reading their ToS correctly (if I'm not, I'm pulling my data asap); (2) kit numbers can be problematic, as per the relatively recent 'contretemps' between GEdmatch and FamilyTree; (3) we redact PIFI here even if it's publically available -- are you arguing for that to change? P.S. Where I live, my 'real estate record' is not publically available. i know SE is hosted in the States and subject to US law, but do you really want to alienate all non-US users?
    – user104
    Commented Apr 12, 2017 at 17:29
  • I wouldn't consider GEDMATCH subscribers as there is no control point of entry and you can login and use tools without having uploaded a kit. The privacy options are "Public / Private / Research: (Research kits can be used with any GEDmatch utility, but they will not be shown in comparison results for other kits. We do NOT encourage the use of the 'Research' option with normal kit uploads, since it is not consistent with the free exchange of genealogy information."
    – CRSouser
    Commented Apr 12, 2017 at 18:06
  • @ColeValleyGirl I still think we should absolutely redact PIFI.. I just think that Kit Numbers by themselves could be argued as not PIFI for the context of genealogy..I know the GEDMATCH and other Facebook groups through have banned "Are we related questions?" which we do not want to take on.. but I think sometimes to legitimately be able to answer a question details help.. it is like saying sometimes I saw this in a book but not saying which book and page it is on if context cannot be provided.. especially for those starting out.
    – CRSouser
    Commented Apr 12, 2017 at 18:08
  • I think ftDNA's Project Administrator guidelines are a pretty good baseline as well: familytreedna.com/learn/project-administration/…
    – CRSouser
    Commented Apr 12, 2017 at 18:12
  • You may be right about GEDmatch's level of access control, but there's still their ToS to consider (intent is important here).. I'd like to think that if I reported somebody that was breaching their ToS then action would be taken... and subscribers to other genealogy sites could still publish my data and I'd expect them to take action likewise. Gedmatch are no different in that respect.
    – user104
    Commented Apr 12, 2017 at 18:30
  • 1
    Still thinking about 'The exception is GEDMATCH where they do agree to make the results publically available.' What users of GEDmatch agree to is to make the results accessible to other users on that site. The GEdmatch ToS specifically states 'We take measures to ensure that only registered users have access to your results.'
    – user104
    Commented Apr 13, 2017 at 5:41
  • I'm leaning towards accepting this as an answer, except for the misleading statement about GEDmatch.
    – user104
    Commented May 3, 2017 at 9:23
  • @ColeValleyGirl So you want me to remove the statement about Public or add a note about TOS? Can you clarify?
    – CRSouser
    Commented May 7, 2017 at 15:19
  • My problem is with "The exception is GEDMATCH where they do agree to make the results publically available." Remove that and we're copacetic.
    – user104
    Commented May 7, 2017 at 15:48
1

One data point that might be helpful... I consulted GEDmatch about a hypothetical situation where I might display Chromosome Browser data.

Their answer (emphasis is mine):

In terms of copyright infringement we are not concerned about showing data from GEDmatch. It would be nice if we are identified

However privacy is a different matter. Many people are concerned about having their DNA identified, especially linked to their name in any way, including a kit number that might be traced. Several blogs use GEDmatch data and screen shots but never with identifying information. I strongly recommend you get permission if possible. Bottom line: protect from identifying others unless you have their permission and act in such a way that GEDmatch doesn't get any complaints.

Should this mean we care about permission or (as suggested in answers to Should we comply with the Terms of Service/Policy Statements of all sites from which our users quote?) leave it to the consciences of individual users as long as the data is anonymised.

You must log in to answer this question.