How to change id for a number of sequence
1
0
Entering edit mode
6.6 years ago

I have a list of sequence which id’s I want to change. I have their tab separated coordinate file having their old ids and new id’s (my id’s).

For eg. I have a sequence like

>Aquca_005_00546.1
KHMALQFAMNAMDELMKMCQMNEPLWIPNNSGTKEMLNMEEHAKMFPWLTNFKQQHSQVRTEATRDSAVVIMNSITLTDAFLDVNKWMDIFPSIISRAKTVQIISSGIAGHASGSLHLMYAELQVQSPLVPTREAHFLRYCQQNAEEGTWAIVDFPIDSFHDSLQYSFPRYRRRPSGCLIQDMPNGYSRVTWVEHAEVEDKPVHQIFNHFVNSGTAFGAQRWLAVLQQQC
>Aquca_014_00016.1
DGWKVLTFENGVEISKRTSASFHIFRSRWLLKSVSPQQFITVANAIDAAKQWDSDLVEAKYIKDLEDNLSIIRLRFGDGSKPLFKNREFIVYERRETMADGTLVVAVASLPKEIAAGLHPKGNNTIRGLLLQSGWVVEELGDDENSCMVTYVVQLDPAGWLPKFFVNRLNTKLVMIIDNLEKL

I want to change their original id’s with my id’s for e.g. Aquca_005_00546.1 with RaAc00546A and Aquca_014_00016.1 with RaAc00016E my tab separated file have

Original id
Aquca_005_00546.1
Aquca_014_00016.1

my id

RaAc00546A RaAc00016E

Original id's and my id's are in tab separated file aligned line by line (Aquca_005_00546.1 = RaAc00546A)

linux perl shell bioinformatics programme • 1.6k views
ADD COMMENT
0
Entering edit mode

As there is a perl tag, take a look at Bioperl module Bio::SeqIO. And also take a look at this(pure solution).

ADD REPLY
0
Entering edit mode
6.6 years ago

Quick (bio)python script tested for your limited sample data:

from Bio import SeqIO
import sys

def changeids(fasfile, iddict):
    outlist = []
    for seq_record in SeqIO.parse(fasfile, "fasta"):
        seq_record.description = ""
        seq_record.id = iddict[seq_record.id]
        outlist.append(seq_record)
    SeqIO.write(outlist, "adaptedIDs.fa", "fasta")

def extractdict(identifierfile):
    with open(identifierfile) as idfile:
        return({line.split('\t')[0] : line.strip().split('\t')[1] for line in idfile.readlines()})

iddict = extractdict(sys.argv[2])
changeids(sys.argv[1], iddict)

Save as changeids.py and execute as python changeids.py yourfas.fa yourids.txt

Expecting a file without header in yourids.txt with in column 1 the identifiers as now and column 2 the identifiers you want, and nothing else. Requires biopython.

ADD COMMENT

Login before adding your answer.

Traffic: 2180 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6