Hi guys, I am working on a yeast species (Candida glabrata) NGS data to find any mutations related to drug resistance. I am new in bioinformatics so I am using Galaxy.eu to get use to algorithms. There is literature about some genes that mutations in theese genes are related to drug resistance. So I decided to get the .gb format file of one of theese genes (PDR1 gene- aprroximately 3.5kbp ) from NCBI and map ( with BWA-MEM) my FASTQ reads ( WGS data) on them. Then I follow theese steps respectively ;
1) remove duplicates by Mark duplicate 2) BAM-left allign 3) filter by mapping quality and filtering unpaired reads 4) Call variants with freebayes ( ploidy as 1 )
After calling variants with free bayes , I have used vcf filter on the vcf file to get rid of bad quality variant calls. However I had no variants after I run the filter. So I decided to annotate the variants of original vcf file ( unfiltered) using snpeff , then use snpsift to extract fields and look for quality mesurments of the original vcf file. I found that QUAL and DP, AO,RO ( I mean RO was generally 0 and AO was generally above 300 ) values are looking good, but SRP,SAP,EPP values are low ( below 20 mostly, for SRP values are nearly all 0 ). I know SRP,SAP and EPP values are important for looking strand and position bias.
My initial question is : Can working on a small gene ( approximately 3.5bp ) be the reason for such low values? I would be so grateful If you have an explanation.