Question

Deletion detected at less frequency than expected for a plasmid sample

0

Entering edit mode

5.6 years ago

kspata ▴ 80

Hi,

I was performing variant analysis for a plasmid sample sequenced on MiSeq with read length PE300. More than one million reads were generated. The mapping rate of the reads to the reference was 98%. I used samtools to detect variants and generate consensus sequence. 3 variants were detected for the sample as follows:

 6354   ACGC>ACGCGC                                    100%   6108X
10384   CCGGGCGACCTAAGCCCGGGCGTCGGGCGACCT>CCGGGCGACCT   10%   2334X
10469   GACGT>G                                        100%   4875X

The variant at 10384 is present at 10% variant frequency.

The coverage at this position is low. I visualized the variant in IGV and sorted it based on base position.

The restriction analysis at this site indicates that the variant is present at a much higher frequency as the Sma1 is not cutting through this site.

Sanger sequencing performed for this site was inconclusive.

To analyse this deletion I performed de-novo assembly using SPAdes. I got a several contigs, from which I selected the one with more than 10,000bp and 100X coverage. I performed a BLAST of reference for this sample and the contig generated. The Blast output only shows the variant present at position 6354 and not the other two.

Query  6301   CCGATTCGCAGCGCATCGCCTTCTATCGCCTTCTTGACGAGTTCTTCTGAATTACGC--C  6358
              |||||||||||||||||||||||||||||||||||||||||||||||||||||||||  |
Sbjct  4140   CCGATTCGCAGCGCATCGCCTTCTATCGCCTTCTTGACGAGTTCTTCTGAATTACGCGCC  4081

The length of the reference is 12089bp. The length of the contig is 11526bp.

I also performed variant calling again on the merged overlapping R1 and R2 reads. The variant calls are as follows:

6354    ACGC>ACGCGC                 100.0   1057X   
6357    CCAACGCGTTTG>CGGCAACGCGTTTG 94.59   1032X   
10469   GACGT>G                     100.0   608X

I am very confused by these results. How can I troubleshoot this and find the accurate variant frequency? I thought of some possible scenarios why VF % at position 10384 is less.

The low coverage at the desired position.
The length of the INDEL is greater than 50bp resulting in aligner not aligning reads at this position.
There is a presence of secondary structure present at these position in plasmids resulting in inconclusive Sanger sequencing analysis, low number of reads generated at this position or incorrect INDEL call at the position.

Can anyone suggest me any other way to affirm this and how to interpret such results using NGS data?

Thanks in advance!!!

alignment assembly • 815 views

ADD COMMENT • link updated 5.6 years ago by Ram 43k • written 5.6 years ago by kspata ▴ 80