SNP calling with fasta files
2
0
Entering edit mode
9.1 years ago
joycewang83 ▴ 20

Hi,

I would like to generate a vcf file listing SNPs present between my bacterial species of interest and the reference genome. However I only have access to FASTA files available on NCBI.

Was wondering if anyone knows whether it's possible to call SNPs with assembled genome sequences?

Thanks, Joyce

SNP • 9.3k views
ADD COMMENT
3
Entering edit mode
9.1 years ago

if you want to call SNPs you would need several overlapping sequences (such as NGS reads, or different EST sequences) and the quality of the bases in order to let an algorithm infer whether a base mismatch is a SNP or not.

but if you just want to have the differences between your reference sequence and any FASTA file you may download then I would suggest you to pair align each FASTA sequence with the reference sequence and then look for information on how to get an VCF file from a FASTA alignment, such as in Getting A Vcf File From A Fasta Alignment

ADD COMMENT
1
Entering edit mode
9.1 years ago
Felix Francis ▴ 600

If you are dealing with two assembled genomes, you could use a genome aligner such as MAUVE to identify the divergent regions(including SNPs) between them. You can write a simple script to convert this data to VCF format.

ADD COMMENT
0
Entering edit mode

Thank you Felix, I have used MAUVE before but am completely inexperienced with script writing!

Jorge, thank you for your explanation and directing me to the other forum. The person who started the forum actually recommended a GUI platform for generating vcf files and I think it might work! Thanks again :)

Joyce

ADD REPLY

Login before adding your answer.

Traffic: 1335 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6