Validate RNAseq reads against array genotypes?
1
0
Entering edit mode
20 months ago
S • 0

I have RNAseq data array genotyping for a set of the same samples but we have evidence that for some of the samples there may have been a mixup or contamination between DNA purification, genotyping and RNA sequencing. Therefore what I need to do is verify that the sequence reads from the RNAseq data match up with the array genotyping. I have tried using bcftools to call genotypes for exonic SNPs using the reads and then using King to analyze relatedness but I am getting poor concordance for all samples. Is there a better tool that exists for this?

rnaseq sequencing contamination array genotyping • 415 views
ADD COMMENT
0
Entering edit mode
20 months ago
LChart 3.9k

I don't think a tool exists to do this specifically in RNA-seq data. You could try CalculateFingerprintMetrics from Picard; but the issue is that the genotype assignments could be distorted by unequal expression of alleles. Luckily the approach is pretty straightforward:

after bcftools -C, you can extract the log-odds for each .bam from the GL field. For microarray entries that are hom-ref, use the reference likelihood (remember to normalize the probabilities! 0,4,8 -> -0.442 ref likelihood - natural log); for those that are not, use log(exp(het)+exp(homvar)) which is the non-reference probability (more stable than het or hom as RNA-seq may not have 50/50 allelic expression). Summing these values across all variants will give you a "match likelihood" for each (bam, microarray) pair.

ADD COMMENT

Login before adding your answer.

Traffic: 2454 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6