Is there a way to prove that two VCFs came from the same person?
1
1
Entering edit mode
9 months ago
Lee ▴ 10

We have 50 bladder cancer germline samples. One of the researchers we collaborate with has retired, and we cannot get metadata. Therefore, we are unsure whether these 50 samples came from 50 individuals or from 30 individuals. Is there any way to determine this?

thank you in advance.

vcf identification individual • 665 views
ADD COMMENT
1
Entering edit mode

You could say that it's highly possible that two samples came from the same individual if almost all of their high quality SNVs match. Back when I was performing such forensics, we had to caveat these as identical genotypes, as they could be from identical twins for all you know, and this was with a subset of SNVs picked from across the genome where 99% of the subset (or something like that) matched.

ADD REPLY
3
Entering edit mode
9 months ago
acvill ▴ 340

If you have enough SNVs, you can try SMaSH, which attempts to computationally identify samples that are derived from the same individual. The program takes VCF as input. Here's an excerpt from the GitHub page:

Sample swaps are a real concern in high throughput sequencing studies. SMaSH helps detect such sample swaps by integrating the information from over 6000 carefully selected single nucleotide polymorphism (SNP) sites from across the human genome to identify which samples in a group of sequencing data sets are derived from the same human individual. Importantly, SMaSH is able to verify sample identity between different data types, such as RNA-Seq, exome, and MethylCap-Seq data.

https://github.com/rbundschuh/SMaSH

ADD COMMENT
0
Entering edit mode

Thank you for your reply. Unfortunately smash doesn't seem to take vcf as input.

ADD REPLY
0
Entering edit mode

use somalier https://github.com/brentp/somalier it accepts vcf as input

ADD REPLY

Login before adding your answer.

Traffic: 1619 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6