Genotyping sites with N in reference genome
1
0
Entering edit mode
10 weeks ago
shpak.max ▴ 50

I performed an alignment of reads from Drosophila simulans (a sister species to melanogaster) against a modified Drosophila melanogaster reference genome where sites that mismatched with simulans were set to N (I had a specific reason for using this alignment).

Afterwards, the same modified reference genome was used for variant calling using GATK's unified genotyper. It didn't occur to me until after the fact that this might lead GATK not to return genotypes at the sites that I set to N. Does UnifiedGenotyper skip/ignore sites where the reference genotype is N, even if there is strong support for particular nucleotides at that site, or can it still identify genotypes at these sites in the final vcf (i.e. can it treat NN as reference and e.g. AA or AG as "variant")?

GATK UnifiedGenotyper • 380 views
ADD COMMENT
0
Entering edit mode

Whether N -> A/T/G/C represents a variant is heavily dependent on the tool itself, and I know most tools don't. Since you don't see it in your vcf, that means that unified genotyper doesn't like it.

A bit digression. This unifiedgenotyper is so old that ppl have switched to haplotypecaller for 10+ years.

ADD REPLY
1
Entering edit mode
10 weeks ago

Does UnifiedGenotyper skip/ignore sites where the reference genotype is N

yes. The variant is in YOUR data, not in the reference where 'N' is for unknown (centromere etc...)

ADD COMMENT
0
Entering edit mode

Just to clarify, I have set additional sites in the reference genome to 'N', which means (based on your statement) that these will not be genotyped by GATK regardless of what and how many bases are found in the reads at this site?

ADD REPLY
0
Entering edit mode

As a follow-up, is it possible to get a variant call with UnifiedGenotyper if rather than N, I have one of the ambiguity codes (e.g. R, Y, B etc) in the reference genome?

ADD REPLY

Login before adding your answer.

Traffic: 1013 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6