Validate RNAseq reads against array genotypes?
6 weeks ago
I have RNAseq data array genotyping for a set of the same samples but we have evidence that for some of the samples there may have been a mixup or contamination between DNA purification, genotyping and RNA sequencing. Therefore what I need to do is verify that the sequence reads from the RNAseq data match up with the array genotyping. I have tried using bcftools to call genotypes for exonic SNPs using the reads and then using King to analyze relatedness but I am getting poor concordance for all samples. Is there a better tool that exists for this?

6 weeks ago
LChart

I don't think a tool exists to do this specifically in RNA-seq data. You could try CalculateFingerprintMetrics from Picard; but the issue is that the genotype assignments could be distorted by unequal expression of alleles. Luckily the approach is pretty straightforward:

after bcftools -C, you can extract the log-odds for each .bam from the GL field. For microarray entries that are hom-ref, use the reference likelihood (remember to normalize the probabilities! 0,4,8 -> -0.442 ref likelihood - natural log); for those that are not, use log(exp(het)+exp(homvar)) which is the non-reference probability (more stable than het or hom as RNA-seq may not have 50/50 allelic expression). Summing these values across all variants will give you a "match likelihood" for each (bam, microarray) pair.