Question: How to calculate mutation rate and mutation sites in a genome using FASTA file?
gravatar for Dr.Animo
9 months ago by
Dr.Animo10 wrote:

Hi, I have 6 viral genome sequences of the same virus and 1 reference sequence in FASTA format. I want to know, how I can identify mutations and mutation sites in those genomes using FASTA sequences (If I've FASTQ file then I'll simply align the reads to the reference and by using variant calling tool I will get the mutate sites) but how I can do this for FASTA file? And how I can identify the mutation rate for one genome?

ADD COMMENTlink modified 9 months ago by a.alnawfal.1992110 • written 9 months ago by Dr.Animo10
gravatar for a.alnawfal.1992
9 months ago by
a.alnawfal.1992110 wrote:

according to what i understood, you have read from 6 viral genome in "Fasta" format and you want to aligned them in order to identified mutations ? do you have Qual file? if yes do the below conversion with biopython

from Bio import SeqIO
from Bio.SeqIO.QualityIO import PairedFastaQualIterator
with open("YourFastaFile.fasta") as f_handle, open("YourQualfile.qual") as q_handle:
records = PairedFastaQualIterator(f_handle, q_handle)
count = SeqIO.write(records, "temp.fastq", "fastq")
print("Converted %i records" % count)

i hope it helps

ADD COMMENTlink written 9 months ago by a.alnawfal.1992110

@a.alnawfal I haven't FASTQ and Qual file.

ADD REPLYlink written 9 months ago by Dr.Animo10

you answer make me confused !! are trying to convert to Fasta or Fastq ?! assuming that you are trying to convert Fasta to Fastq without Quality score file: first of all, Fasta contains only the sequence information and lacks of quality information, unlike Fastq where both are there in fourth and thired lines. in this case you could produce dummy Fastq file with the real sequence and dummy quality score for each position. keep in mind if you do so, it going to affect aliment performance.

to Convert from Fasta to Fastq using Seqtk ""

seqtk seq -F '#' in.fa > out.fq

where the quality scores will be '#' if you care about the quality of Mapping & variants calling from this data, i do not recommend you to use it and better to find out the quality score file from the sequencing lab. i hope it helps

ADD REPLYlink written 9 months ago by a.alnawfal.1992110
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2024 users visited in the last hour