Perform of disco Snp with low coverage samples
Entering edit mode
5.3 years ago
shinken123 ▴ 150

Hi All,

I have some hundreds of pair end illumina fastq files with really low coverage 0.5x (Skim Seq). I am wondering if discoSNP could make a decent work calling SNPs from these samples. Or which software you recommend me?

Would be good to be able to call SNPs using even a single read. Because we have several individuals from the same population we could then filter the SNPs in terms of mayor and minor frequencies to keep only the most reliable SNPs (present in higher frequencies in the population, of course may be losing some real low frequency SNPs)

Also to avoid SNPs comming from paralogous sequences, I could filter the reads from mapping files and be more estrict in terms of percent of identity and percent of the lenght of the alignead read. Also, becasue I am working with complex genomes (maize) may be would be a good idea to determine the mappabilty of the genome to keep only regions with "good" mappability.

Now I am wondering which softwore would me allow to call SNPs even from a single read, and with no so many filters, at least at ethe beggining for a initial SNP calling.

Best Wishes,


discosnp low coverage SNP calling • 1.1k views
Entering edit mode
5.3 years ago

Hi Eric,

You may try discoSnp, using all kmers (-c 1) but in this case, you not be able to make the difference between a real variant and a sequencing error in your data.

You may also try to use kmers seen at least twice (-c 2). In this case you 'll miss all variants having coverage 1. The recall will be bad, but you'll have a decent/high precision as, with 0.5x, you expect sequencing errors to occur at most once.

All the downstream filters you mention seem interesting.

Best, Pierre


Login before adding your answer.

Traffic: 1276 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6