From Gramps
Revision as of 16:18, 16 September 2009 by Fsmunoz (talk | contribs) (Y-line research)
Jump to: navigation, search

Everybody following genealogy publications will have encountered the use of genetics in genealogy: hereditary diseases, genetic markers, ...

Here we give an overview of what type of genetics data is important in genealogy.


Humans have 23 pairs of chromosomes which form their DNA (deoxyribose nucleic acid). Each of us inherits one of each pair from one parent, and the other from the second parent. From the mother one inherits one cell of which these chromosomes form the nucleus. However, this cell also contains some genetic material which is not part of the human chromosome, called mitochondrial DNA (mtDNA), and also essential for our survival. A special role is given to the sex chromosome. For females this is a pair of X chromosomes, one from the mother, one from the father. For males however this pair consists of a X chromosome inhereted from the mother, and a Y chromosome, inhereted from the father.

At each generation, 23 chromosomes are passed through to the children, constituted of material from the 23 chromosome pairs of the parents. Over time, mutations also happen, meaning that parts of the chromosome are not identical to the same parts on the DNA of the parents. Mutations that are not lethal are called variants. Some mutations happen fast, some slow, and some very slow. Should there be no mutation it would be impossible to tell how related people are one from the other. Fast mutations allow to distinguish family groups recently in time. Slow mutations allow to distinguish race groups up to ancient migration patterns, e.g. the saxon migration into Europe.


There are many privacy issues with storing genetics information. Caution is advised. Some information is harmless (eg DIS information which comes from the junk part of your DNA, not used in the bleuprint) while other information can be dangerous to publish (eg hereditary diseases)..

Even the genetically harmless information can be privacy infringing, as it can prove two people not to be related, which in itself can be troublesome (divorce issues, etc.).

Hence some tips if you want to store this information:

  • set this information as private
  • never include private information in reports you publish
  • if you share information with other researchers, only share the public data

For harmless genetics data, it is usefull to publish the data anonimized on a public forum. You can do that eg here on this wiki, but note that your account details will be visible in the history, so you might consider to send it to one of the administrators, or create a fake login for this reason.

Eg, DYS information is usefull to relate family branches, so publishing 'last name, region of birthplace, DYS codes' gives other researchers a forum to see how related they are to you.

Y-line research

The Y chromosome plays an important role. It is only inhereted from father to son, and hence is in theory identical to the Y chromosome of the father, apart from possible variants. It is possible to commercially investigate the Y chromosome of a male, and receive marker information. A marker is a grouping on the Y chromosome, of which the structure is investigated, and cataloged. Normally markers are part of the junk DNA, so this information is in itself genetically harmless.

Y-line STR Markers

A STR marker has a specific place on the Y chromosome, indicated by a DYS# code (DNA Y-chromosome Segment), and by a specific value, called allele, which essentially is the number of repeats of a certain marker. Repeats are common in the junk part of the human DNA.

There are already a 100 possible DYS markers available. Typically nowadays 12 to 37 will be tested.

How to use DYS numbers

By comparing a persons DYS profile, with that of another male, one can determine if one is direct family (identical DYS profiles), or guess based on the number of mutations how many generations ago a common ancestor lived. For this a table must be made between people with a DYS profile, with the difference in allele number

profile difference DYS19 DYS389I ...
active person - 14 12
person A 0 14 12
person B 1 15 12
person C 3 13 14

The less differences, the more related a person is to the person of which the profile is investigated.

Y-DNA Haplogroups

Less specific than STR markers, a haplogroup is defined by certain SNP mutations that occur infrequently. It is less useful for estimating degrees of relationship between individuals but useful in a broader way, since it deals with whole populations. A Y-DNA haplogroup will be the same for every male in the direct paternal line (father, father of father, father of father of father, etc), except when a "non-paternal event" occurs (i.e. there isn't a genetic contribution from the legal and recognised father).

One way to use this information in a genealogy program is to allow one to stipulate the haplogroup (I2, R1b, J1a1, etc.) for an individual, and cascade that change upwards through the male line. Then allow for a way to break this chain, possibly marking the fact with a flag or visual cue.

An example: a researcher tests for Y-DNA haplogroup and gets the result as R1b1. It fills the "Y-DNA Hg" attribute with "R1b1" and it is cascaded upwords (and downwards) through all the relevant lines. Donwards doesn't only apply to direct progeny: all the cousins that descend though a paternal line from a common ancestor should also be marked R1b1.

Six months latter a cousin (which should be R1b1) makes the same test and gets haplogroup J2. The initial assertion is clearly wrong, and going upwards the genealogy software could pinpoint the most recent male ancestor and flag the discrepancy. This isn't limited to this kind of situations: we now have the haplogroup of several individuals deceased a long time ago, so this "most remote ancestor" could be in the 16th century.

The relevance of this method is not limited to finding mismatches: the much more common situation is that different tests from different lines will yeld the same result. It even allows one to have a good idea of a certain person haplogroup without making a test.

M-line research


Please update or expand this section.

Hereditary diseases/traits

In recent times, it has become possible to know the cause of death with a high degree of certainty. Also, many hereditary deviations are known, people talking about it openly. Many people will store this information in a genealogical application as eg an attribute (set private of course).

This information of today, can shed light on some strange facts in your family trees history, eg many male early deaths, ...

To be able to extrapolate known facts of today to the past, you need some knowledge: * is it inherited from the father or mother?

  • what is the possibility of inheriting the trait?
  • under what conditions does the trait show itself?

Privacy: keep this information private. Although you might not mind letting the world know men in your family are bold at 40, your cousin twice removed looking for girl might think otherwise.

Family trees for genetic research

Many research institutions have a need for extensive reliable family trees, and will employ genealogists for this purpose.

It allows them to investigate traits and deviations in a broad testfield, while knowing how related the samples are.