Question: EDIT:SOLVED:changing the fasta header in a multi-fasta file with biopython.
1
gravatar for skbrimer
3.6 years ago by
skbrimer550
United States
skbrimer550 wrote:

EDIT SKB 17MAR16 Hi Group,

figured out that I was not calling the correct variable in my loop. I made a handle file outside of the loop to append, but then was using the SeqIO.write function without the handle, so it was writing over the same file with just the last record. Please find the correct script below and thanks for the help!

# this script is used to convert fastq files to fasta files 
# then to rename the fasta ID with the sample ID from the lab

from Bio import SeqIO
import sys 

# grabbing the file and the name 
seq_file = sys.argv[1]
labels = seq_file.split(".")

# converting the file from fastq to fasta
SeqIO.convert(seq_file,"fastq",labels[0]+".fasta","fasta")

# taking the converted file and then changing the fasta header
handle = open(labels[0]+".fasta","a")

for seq_record in SeqIO.parse(handle,"fasta"):
    old_header = seq_record.id
    new_header = labels[0]
    seq_record.id = new_header + "_" + old_header # renaming the pseudogene with
                                                  # the lab id and the referance 
                                                  # used
    seq_record.description = "" # this strips the old header out
    SeqIO.write(seq_record, handle,"fasta")

handle.close()

/EDIT

Hi group,

I wrote a script that will take the fastq output from samtools mpileup > vcf2fq and create a fasta file that includes the drops in coverage. When I use this scripted for a bacterial genome it works as expected, however when I use it on a segmented genome it only outputs the last segment in the file. I'm using python3.4 and biopython1.66. Please see my code and troubleshooting below.

# this script is used to convert fastq files to fasta files 
# then to rename the fasta ID with the sample ID from the lab

from Bio import SeqIO
import sys 

# grabbing the file and the name 
seq_file = sys.argv[1]
labels = seq_file.split(".")

# converting the file from fastq to fasta
SeqIO.convert(seq_file,"fastq",labels[0]+".fasta","fasta")

# taking the converted file and then changing the fasta header
handle = open(labels[0]+".fasta","a+")

for seq_record in SeqIO.parse(handle,"fasta"):
    old_header = seq_record.id
    new_header = labels[0]
    seq_record.id = new_header + "_" + old_header # renaming the pseudogene with
                                                  # the lab id and the referance 
                                                  # used
    seq_record.description = "" # this strips the old header out
    SeqIO.write(seq_record, labels[0]+".fasta","fasta")

handle.close()

I think I have an error when writing the new record. I have been playing with this line:

handle = open(labels[0]+".fasta","a+")

When I have it written like this, It runs without error, I get a multi-fasta output without the header change; however when I have the code written like this:

handle = open(labels[0]+".fasta","rU")

like in the biopython tutorial, I get the output header I want, but only the last record in the file. I'm not sure if or why with the file open in "append" is just skipping the next part of the script. I would appreciate any input be appreciated

biopython • 2.4k views
ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by skbrimer550

Its strange, Ideally, in "rU" mode, the file should not be writable. Its strange that you are able to write in "rU" mode.

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by geek_y9.9k

I think its weird as well, also the universal newline 'U' argument is deprecated in python as well, I'm not sure how much longer it will still be an option.

ADD REPLYlink written 3.6 years ago by skbrimer550
1
gravatar for skbrimer
3.6 years ago by
skbrimer550
United States
skbrimer550 wrote:

See original thread for answer

ADD COMMENTlink written 3.6 years ago by skbrimer550
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2099 users visited in the last hour