Question

Detect mutations in clonally propagated plants

0

Entering edit mode

11 months ago

RvV ▴ 30

I am analysing Illumina whole-genome resequencing data from two clonally propagated plants aiming to find any potential variants that are unique to either of the two. Note that these would be expected to be somatic mutations and that due to the nature of propagation (by cuttings), I cannot predict the frequency of such variants in a sample. What I want is to detect any strongly-supported unique variants and I am envisaging the following approach:

Map reads to reference genome
Consider regions with expected read depth in both samples
Detect all variants in those regions that occur in at least 2 reads
Find variants that are unique to either sample
Validate variants through visual inspection of read mappings at the corresponding positions

I can perform all steps, except 3. This is because variant callers generally call genotypes assuming diploidy and, therefore, do not report variants that do not conform to expected genotyping ratios (0/0 0/1 1/1). Based on step 5, I find that most (if not all) detected "unique" variants are the result of differences in variant frequencies rather than actual presence/absence.

Is there a way to call any variant that is supported by at least x read mappings? For example, by parsing samtools mpileup output or by applying a specific variant caller?

Any advice or suggestion would be most welcome.

Thanks!

mutations Illumina clonal-propagation variant-calling • 550 views

ADD COMMENT • link 11 months ago by RvV ▴ 30

0

Entering edit mode

Is there a way to call any variant that is supported by at least x read mappings?

before calling, you could use samtools depth to create a bed file where both BAMS have a depth > 'x'

in the VCF, you could FILTER the genotypes having a FORMAT/DP > 'x'

?

ADD REPLY • link 11 months ago by Pierre Lindenbaum 161k

0

Entering edit mode

Thanks but filtering on depth (step 2) is not my issue. Rather, it how genotypes are called. I found that many variants are genotyped as 0/1 in one sample and 0/0 in another, only because the number of supporting reads was lower. Thus these variants are erroneously listed as "unique".

I want to call all variants that are supported by at least x reads.

ADD REPLY • link 11 months ago by RvV ▴ 30

0

Entering edit mode

I want to call _all_ variants that are supported by at least x reads.

sorry, I don't understand your point. Could you please explain, for example if the genotypes in the VCF are:

GT:DP  0/1:3 0/0:100

ADD REPLY • link 11 months ago by Pierre Lindenbaum 161k

0

Entering edit mode

No problem - I am glad you're helping! For example, see the variant below that was genotyped using GATK pipeline:

chr6    7856    .   G   A   10.0546 .   DP=82;VDB=0.258129;SGB=-5.0297;RPBZ=-1.09161;MQBZ=-8.29566;MQSBZ=2.63667;BQBZ=0.117184;SCBZ=0;MQ0F=0;AC=1;AN=4;DP4=39,27,10,0;MQ=56 GT:PL   0/0:0,4,255 0/1:44,0,255

When I examine the read mappings at position 7856, the first sample has 51 mapped reads of which 5 are A. Consequently, only few reads support the variant and the genotype is called as homozygous reference.

The second sample has 34 reads of which 6 are A, a larger fraction of reads support the variant and the genotype is therefore called as heterozygous.

However, I want to call all variants that are uniquely present in one of the samples (with minimum of 2 supporting reads to ignore sequencing errors).

I hope this makes it more clear. Thanks again

ADD REPLY • link 11 months ago by RvV ▴ 30