Entering edit mode
20 hours ago
yesquokkan
•
0
Hello, I am a beginner graduate student who has just started learning bioinformatics.
I would like to perform SNP calling on a specific bacterial genome. Is it possible to do this at the assembly level?
From my understanding, using raw read data seems better, since SNP calling can then be based on quality scores, and factors such as coverage or differences in sequencing platforms could be considered—whereas using assembled genomes might be less reliable.
What is the general practice in this case?
If SNP calling is also possible using assembled genomes, which tools are typically used?
By aligning to a reference you can find the differences between your assembly and the reference.
Doing this could be problematic for some reasons. You no longer have the depth of sequences to confirm a particular difference (supported by multiple independent reads) and generate a confidence for the call. If your assembly is not properly done, you could end up with spurious SNP's.
Thank you for your kind answer. May I ask some following questions.
However, I only have full access to assembled genomes, and some to raw reads. After I find some differences between the assembly and the reference, how can I decide confident variants without the depth? Is there any way of filtering out spurious SNPs when analyzing with assemblies?
Don't think you can .. not unless you do some additional experiments (e.g. PCR) to verify/confirm them.