Variant discrepancy between technical replicates
0
1
Entering edit mode
3.2 years ago
kspata ▴ 80

I am performing alignment to reference and variant calling for same sample using freebayes. Both the technical replicates were sequenced on MiSeq PE 250. The raw data was trimmed using trim galore with Q > 30 and length >= 200. The trimmed reads were mapped using bowtie2 to the reference sequence. The variants were called from the re calibrated and sorted bam files using freebayes.

Please let me know if you need raw data.

For Sample 1 following command was used to call variant:

freebayes --ploidy 1 -f LYS-GM101-AAV.fasta -F 0.01 --min-coverage 10 -C 4 756296.sorted.bam


This detected one 11 bp insertion variant at 53% variant frequency in the resulting vcf file.

When the freebayes was called using q30 for the same sample no variants were detected.

freebayes --ploidy 1 -f LYS-GM101-AAV.fasta -F 0.01 --min-coverage 10 -C 4 -q 30756296.sorted.bam


For Sample 2 i used filtering based on base quality q 30

freebayes --ploidy 1 -f LYS-GM101-AAV.fasta -F 0.01 -C 4 --min-coverage 10 -q 30 757372.sorted.bam


For Sample 2 again, when I did not use the filter q30, the 11 bp insertion variant was detected at 48% frequency.

freebayes --ploidy 1 -f LYS-GM101-AAV.fasta -F 0.01 -C 4 --min-coverage 10 757372.sorted.bam


This did not detect any variant.

I checked the reads containing variants for both samples and found that the inserted bases had base call quality > 30. Then why were these variants filtered out when freebayes was used with q 30. As the base call quality was high, the variant should have been called even after filtering.

Can you please elaborate on why this happened? As per the freebayes manual to call an insertion the mean quality of bases inside an insertion is taken into account so why were these reads not considered?

Thanks!

freebayes alignment • 736 views