How to calculate mutation rate and mutation sites in a genome using FASTA file?
1
0
Entering edit mode
2.1 years ago
Dr.Animo ▴ 10

Hi, I have 6 viral genome sequences of the same virus and 1 reference sequence in FASTA format. I want to know, how I can identify mutations and mutation sites in those genomes using FASTA sequences (If I've FASTQ file then I'll simply align the reads to the reference and by using variant calling tool I will get the mutate sites) but how I can do this for FASTA file? And how I can identify the mutation rate for one genome?

SNP alignment mutation mutation rate • 893 views
0
Entering edit mode
2.1 years ago

according to what i understood, you have read from 6 viral genome in "Fasta" format and you want to aligned them in order to identified mutations ? do you have Qual file? if yes do the below conversion with biopython

from Bio import SeqIO
from Bio.SeqIO.QualityIO import PairedFastaQualIterator
with open("YourFastaFile.fasta") as f_handle, open("YourQualfile.qual") as q_handle:
records = PairedFastaQualIterator(f_handle, q_handle)
count = SeqIO.write(records, "temp.fastq", "fastq")
print("Converted %i records" % count)


i hope it helps

0
Entering edit mode

@a.alnawfal I haven't FASTQ and Qual file.

0
Entering edit mode

you answer make me confused !! are trying to convert to Fasta or Fastq ?! assuming that you are trying to convert Fasta to Fastq without Quality score file: first of all, Fasta contains only the sequence information and lacks of quality information, unlike Fastq where both are there in fourth and thired lines. in this case you could produce dummy Fastq file with the real sequence and dummy quality score for each position. keep in mind if you do so, it going to affect aliment performance.

to Convert from Fasta to Fastq using Seqtk "https://github.com/lh3/seqtk"

seqtk seq -F '#' in.fa > out.fq


where the quality scores will be '#' if you care about the quality of Mapping & variants calling from this data, i do not recommend you to use it and better to find out the quality score file from the sequencing lab. i hope it helps