Last summer, Shelley Crawford posted directions on how to use NodeXL and Microsoft Excel to create a cluster diagram for DNA matches in her Twigs of Yore blog. [Visualizing DNA Matches – Index]. When I tried this with my data, I had trouble getting past a ‘blob’. According to a Facebook post dated July 16, 2017, I was able to transform my black blob into some smaller clusters.
Not remembering exactly how this process worked, I decided to try again. I used the DNAGedcom client to download my match data from Ancestry. For this trial, I downloaded ALL of my matches. Following the directions, I loaded my matches and in common with files into the NodeXL Template. I also imported my ‘Additional Input’ file. In the ‘Additional Input’ file, I told the program to skip DNA data for my brothers and my mother. When I graphed this data, I got a blob.
The next step was to group. I tried grouping by ‘connected component’ and still had a black glob. Thus, I tried grouping by cluster instead. I picked the Clauset-Newman-Moore cluster algorithm. I also set the layout option to “lay out each of the graph’s groups in its own box.”
Now, I have several colored blobs of varying sizes. The grey background represents all of the lines connecting one ‘blob’ to another.
The above graph contains data for over 50,000 matches. It also does not skip my dad’s first cousin.
My next step was to change the visibility for my dad’s first cousin to ‘skip’. Unfortunately, all of the globs are still globs. So, I went back to DNAGedcom to re-download data. This time, I checked ‘Skip Distant Cousin Matches’.
I started the entire process over with these new files. This time I had almost 1800 matches. I still had a ‘blob’ but much less dense.
The next step was to group the results. Knowing that grouping by connected compenent didn’t work before, I again grouped by the cluster. After changing the layout options to ‘layout each of the graph’s groups in its own box’, I now had a graph with the dots arranged by colors.
Curious as to whether the various colors could be associated with a specific surname, I used the notes field on the spreadsheet to locate specific dots. When I clicked on a line in the spreadsheet, the graph would show that line as the center of lines connecting it to other dots (the shared matches).
The first dot I tried was a match with my mother and a likely MENTZER relative.
That dot started from the red area and branched out. Thinking that the orange area below the red area corresponded to my CRAWFORD line, I clicked on the match directly above that previous match — only to discover that it also originates from the red area.
Both a MENTZER match (my mom’s line) and a CRAWFORD match (my dad’s line) have dots in the red area and not in separate areas. Thus, my theory about the colored areas matching surnames doesn’t stand up to this simple test. I also can’t use geography to explain this. My Mentzer line was in Massachusetts, Illinois and southeast Kansas. My Crawford line was in Kentucky, Ohio, Indiana and southwestern Kansas.
Thus, I need to learn more about how to interpret this data!