Determining the presence of alleles based on depth, coverage, divergence, MAF
0
0
Entering edit mode
5.2 years ago

Hello everyone,

I was wondering if you could help me with an analysis I've been doing on some alleles. My project consists on determining the presence/absence of 5 different alleles in malaria, however I've found it hard to find the threshold of which allele is actually there and what is noise. For my analysis, I used a set of .fastq files (20 of them) and input them in a SRST2 (similar to MLST) along with a database of my allele references. After this, I obtained something like this:

Sample #; Allele_name; Coverage; Depth; Diffs; Divergence; Length; maxMAF

17; FC27; 100%; 7743.733; 8snp; 2.439%; 328bp; 0.054

17; MAD20; 100%; 5671.843; 15snp; 4.808%; 312 bp; 0.487

17; 3D7; 100%; 647.33; 3snp; 0.488; 615; 0.493

17; R033; 100%; 5.043; no differences; 0; 162; 0.5

As you can see in these results for sample # 17, It shows four potential alleles present in my sample, however I am not sure what threshold to use as to determine if the allele 3D7 is present, given it has 647 reads covering the area, or if that's too low to say it is actually there. Some other helpful information for this would be the following:

Sample 17 has an average of 44,424 reads with a mean size of 207 bp. The size of the alleles are the following:

3D7 = 625 bp; FC27 = 611 bp; MAD20 = 316 bp; R033 = 162 bp

Thanks a lot!

NGS Deep sequencing Post genomics next-gen • 859 views
ADD COMMENT
1
Entering edit mode

I am not familiar with the program, however, the default for depth appears to be 5:

--min_depth MIN_DEPTH       Minimum mean depth to flag as dubious allele call (default 5)

Taken from: https://github.com/katholt/srst2

ADD REPLY
0
Entering edit mode

Hi! Thanks for the answer! However I was more inclined to get the threshold of knowing how many reads I need to get in order to state that 'x' allele is present and it's not noise or contamination. I'm not sure if getting more than 5 reads (let say 6) would be significant for it. Do you know what sort of stats can I use or perhaps hypotheses testing could be useful for this?

ADD REPLY
0
Entering edit mode

Our empirical data from the National Health Service (NHS) England indicated that the minimum position read depth at which you should be calling a variant is 18. This is for germline variants that would be heterozygous or homozygous. If it's a heterozygous call, then, for example, reads could be distributed 10 and 8.

ADD REPLY

Login before adding your answer.

Traffic: 1884 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6