Question: Adding Fasta unique identifiers
0
gravatar for Rose
21 months ago by
Rose0
Rose0 wrote:

Hi, I would like to introduce unique identifiers to my Fasta files from:

>Ricinus_communis_APK1A
>Ricinus_communis_APK1B
>Ricinus_communis_APK1C

To

>1 Ricinus_communis_APK1A
>2 Ricinus_communis_APK1B
>3 Ricinus_communis_APK1C
sequence forum • 610 views
ADD COMMENTlink modified 21 months ago by camachofrancine90 • written 21 months ago by Rose0
1
gravatar for camachofrancine
21 months ago by
United States
camachofrancine90 wrote:

A simple way, is to just iterate through the fasta file using Python and add the headers to a dict, if you find a match while iterating to the key then you can just add another field. Something like this.

from Bio import SeqIO
import os 

fastadir = ""
fastafile = "input.fa"
outfile = "ouput-editedIDs.fa"

os.chdir(fastadir) 
headerName= {} 
with open(outfile, 'a') as newFastaFile:
    for record in SeqIO.parse(open(fastafile, 'rU'), 'fasta'):
        record_id = record.id
        record_seq = record.seq 
        if record_id not in headerName: 
            headerName[record_id]= 0
        else:
            headerName[record_id]= headerName[record_id]+1
            print (headerName)
            record_id = record_id+ " "+str(headerName[record_id]) # if the header is in then we have duplicated fasta headers 
        record.description = "" 
        row = str(">"+ record_id+'\n'+ record_seq + '\n')

        newFastaFile.write(row)
newFastaFile.close()
print ("FINISHED WRITING TO FILE ")
ADD COMMENTlink modified 21 months ago • written 21 months ago by camachofrancine90
0
gravatar for Pierre Lindenbaum
21 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum115k wrote:
awk '/^>/ {printf(">%d %s\n",++N,substr($0,2));next;} {print;}' input.fa
ADD COMMENTlink written 21 months ago by Pierre Lindenbaum115k

I works, but the file is not modified. Please how to save the modifications

ADD REPLYlink written 21 months ago by Blaise0
1
awk '/^>/ {printf(">%d %s\n",++N,substr($0,2));next;} {print;}' input.fa > output.fa
ADD REPLYlink written 21 months ago by Matt Shirley8.7k

learn linux: http://linuxcommand.org/lts0060.php

ADD REPLYlink written 21 months ago by Pierre Lindenbaum115k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1083 users visited in the last hour