Question: Adding Fasta unique identifiers
0
gravatar for Rose
23 months ago by
Rose0
Rose0 wrote:

Hi, I would like to introduce unique identifiers to my Fasta files from:

>Ricinus_communis_APK1A
>Ricinus_communis_APK1B
>Ricinus_communis_APK1C

To

>1 Ricinus_communis_APK1A
>2 Ricinus_communis_APK1B
>3 Ricinus_communis_APK1C
sequence forum • 655 views
ADD COMMENTlink modified 23 months ago by camachofrancine90 • written 23 months ago by Rose0
1
gravatar for camachofrancine
23 months ago by
United States
camachofrancine90 wrote:

A simple way, is to just iterate through the fasta file using Python and add the headers to a dict, if you find a match while iterating to the key then you can just add another field. Something like this.

from Bio import SeqIO
import os 

fastadir = ""
fastafile = "input.fa"
outfile = "ouput-editedIDs.fa"

os.chdir(fastadir) 
headerName= {} 
with open(outfile, 'a') as newFastaFile:
    for record in SeqIO.parse(open(fastafile, 'rU'), 'fasta'):
        record_id = record.id
        record_seq = record.seq 
        if record_id not in headerName: 
            headerName[record_id]= 0
        else:
            headerName[record_id]= headerName[record_id]+1
            print (headerName)
            record_id = record_id+ " "+str(headerName[record_id]) # if the header is in then we have duplicated fasta headers 
        record.description = "" 
        row = str(">"+ record_id+'\n'+ record_seq + '\n')

        newFastaFile.write(row)
newFastaFile.close()
print ("FINISHED WRITING TO FILE ")
ADD COMMENTlink modified 23 months ago • written 23 months ago by camachofrancine90
0
gravatar for Pierre Lindenbaum
23 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum116k wrote:
awk '/^>/ {printf(">%d %s\n",++N,substr($0,2));next;} {print;}' input.fa
ADD COMMENTlink written 23 months ago by Pierre Lindenbaum116k

I works, but the file is not modified. Please how to save the modifications

ADD REPLYlink written 23 months ago by Blaise0
1
awk '/^>/ {printf(">%d %s\n",++N,substr($0,2));next;} {print;}' input.fa > output.fa
ADD REPLYlink written 23 months ago by Matt Shirley8.8k

learn linux: http://linuxcommand.org/lts0060.php

ADD REPLYlink written 23 months ago by Pierre Lindenbaum116k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1300 users visited in the last hour