Cluster Your DNA Matches with GEDmatch
Clustering is a method of looking at multiple DNA matches at one time to see how you might be related to them. GEDmatch has a clustering tool that you can use to compare DNA kits across commercial genetic genealogy testing companies.
How to Access the Cluster Tool on GEDmatch
On the GEDmatch website home page,
Log in to access the Tier One Tools.
Scroll down to the Tier 1 Tools.
Select the option "Clusters, Single Kit input, Basic Version!"
Then you'll access this screen.
To make the cluster tool work, you need a kit number (from a GEDmatch upload) - yours, your relatives, a DNA match.
Key in or copy and paste the kit number into this form above. Then select the threshold, which I'll discuss in a moment.
Then GEDmatch will gather all the data to create the cluster table.
Depending on your thresholds and how many matches you have, this could take several minutes. Just sit tight and wait.
Watch the GEDmatch Cluster Tool in Action
GEDmatch Cluster Results
Once GEDmatch has figured out where everything is, it will display a table similar to the one below.
As you can see in the image above, GEDmatch will arrange your DNA matches based on the average shared centimorgans for each cluster. Even though this process may take 3-5 minutes, it's nice to watch this graphic as you see kits move back and forth.
Depending on what you set your thresholds at and depending on the matches, this table could be humongous. In the video, I had 500 matches. With the new 2021 update, I have 167 matches.
If I look across the top and look down the side, I will see the exact same names of my DNA matches.
Each one of these DNA matches shared centimorgan values match between the threshold that I've set. Additionally, they also match other people between that threshold as well. That's what these boxes indicate.
Change How You View Your Cluster
One of the things that you can do with the clustering tool on GEDmatch is you can change how this will be displayed.
If I look at the drop-down menu up at the top, you can arrange by name, kit number, cluster number, or cluster size.
When I arranged the cluster by cluster size, all the data reorganize based on whichever is the biggest cluster and then down the line.
You can see here for this giant cluster at the beginning. They all share on average about 21 centimorgans. This is going to be a very distant relation that all these people have with me.
How to Do Genetic Genealogy With Cluster Charts
To start, I like to focus on those clusters that have the most amount of shared DNA with me. I arrange my clusters by cluster average cM to the reference kit.
Then I can click on one of the matches in a cluster. GEDmatch will take us to the One-to-one Autosomal Comparison Entry Form. We can then make a one-to-one comparison with any one of the matches in the cluster.
When I first create a GEDmatch cluster chart, I use the basic thresholds, which were 15 centimorgans to 50 centimorgans.
We can actually change these numbers. Changing the thresholds can have a dramatic effect on the cluster chart. It can affect how many people will show and the size of the clusters.
All because you're looking at different ranges of the amount of shared centimorgans between all these people.
Let's say I lower the threshold from 15 centimorgans up to 20 centimorgans. I notice that the number of matches can reduce, sometimes nearly by 1/3, just by increasing the threshold by 5 centimorgans.
Where this could be important is for people that have a lot of endogamy in their family history. By changing the amount of the threshold from 15 to 20, you're going to eliminate many people that may be even more distantly related than what you want to be looking at.
We can also change the upper threshold. Perhaps we want to include some of our cousins. We can increase the threshold to 1,00 centimorgans. Such a change will not include my aunts, uncles, grandparents but should pick up my cousins and second cousins.
What do the Grey Squares Indicate?
This change has some interesting facts. The higher the upper threshold for centimorgans, the more grey squares I pick up.
These grey squares indicate that the DNA match probably belongs in multiple clusters.
For instance, some matches belong not only to an orange cluster, as well as a purple cluster. The more closely related your DNA matches, the more cluster overlap you may find. All of which is indicated by the presence of the gray squares.
Use these gray squares as clues to how different clusters are related to you. In essence, what you're doing is you're categorizing people based on other people as well.
No matter what type of clustering you do, you're going to be getting some information that may help you in your genealogy.
↪️ Confused about DNA and genealogy?
Grab your copy of this FREE DNA guide:
Clustering and Endogamy
One of the really tricky things with clustering is families that have endogamy. Because there is so much shared DNA, you may end up with this huge giant cluster covering most of your page.
In the video, I show you an example of what endogamy looks like on the cluster charts. I've also included a photo below of the cluster from the video. We can see this orange cluster has 450 people in it.
From a genealogy standpoint, that is just way too many people to try to work with at one time. From a clustering standpoint, this large cluster is not very useful.
If you have endogamy in your family tree, adjust the thresholds to try to break up a cluster like this.
Also be aware, that endogamous clusters take a long time to load. The chart in the video took about seven minutes because of all these connections.
When I changed the lower threshold to 25 centimorgans, the clusters begin to take that great big giant cluster and break it into little ones. Although once we get down far enough, there will be a really big cluster that still is the majority of it.
More Tips on Using GEDmatch
Explore the following blog posts to continue your genetic genealogy education.