Add something to change the transcripts ID
0
0
Entering edit mode
7.0 years ago
Alex ▴ 50

Dear Biostars friends, I am learning the programming and encountered some problems while I can't solve it now I want to add _1,_2,_3... to the transcripts ID with the same gene,my original file like this :

scaffold_1 transcript 55098 57492 . + . gene_id "Seita.1G000200"; transcript_id "Seita.1G000200"

scaffold_1 exon 55098 55372 . + . gene_id "Seita.1G000200"; transcript_id "Seita.1G000200"

scaffold_1 transcript 55102 57490 . + . gene_id "Seita.1G000200"; transcript_id "Seita.1G000200"

scaffold_1 exon 55102 55372 . + . gene_id "Seita.1G000200"; transcript_id "Seita.1G000200"

scaffold_1 transcript 55102 57480 . + . gene_id "Seita.1G000200"; transcript_id "Seita.1G000200"

scaffold_1 exon 55102 55372 . + . gene_id "Seita.1G000200"; transcript_id "Seita.1G000200"

scaffold_1 transcript 75108 76843 . + . gene_id "Seita.1G000300"; transcript_id "Seita.1G000300"

scaffold_1 exon 75108 76406 . + . gene_id "Seita.1G000300"; transcript_id "Seita.1G000300"

while the the target file like this:

scaffold_1 transcript 55098 57492 . + . gene_id "Seita.1G000200"; transcript_id "Seita.1G000200_1"

scaffold_1 exon 55098 55372 . + . gene_id "Seita.1G000200"; transcript_id "Seita.1G000200_1"

scaffold_1 transcript 55102 57490 . + . gene_id "Seita.1G000200"; transcript_id "Seita.1G000200_2"

scaffold_1 exon 55102 55372 . + . gene_id "Seita.1G000200"; transcript_id "Seita.1G000200_2"

scaffold_1 transcript 55102 57480 . + . gene_id "Seita.1G000200"; transcript_id "Seita.1G000200_3"

scaffold_1 exon 55102 55372 . + . gene_id "Seita.1G000200"; transcript_id "Seita.1G000200_3"

scaffold_1 transcript 75108 76843 . + . gene_id "Seita.1G000300"; transcript_id "Seita.1G000300_1"

scaffold_1 exon 75108 76406 . + . gene_id "Seita.1G000300"; transcript_id "Seita.1G000300_1"

Thanks for the help

sequence gene • 1.6k views
ADD COMMENT
0
Entering edit mode

Two remarks:

  1. When you post some tabular file content like in this case, wrap it with the "code sample" option. It's the 5th button from the left of your message editor panel.

  2. This is actually an easy task if you know a little bit of scripting. I would suggest you to learn some python, perl or bash to achieve this result quickly. Creating a dictionary with genes would help you, or a list of tuples.

If you don't want to, you can use a counter that starts from 1 and adds up as long as the "gene_id" field is the same as the line before. This requires your file to be sorted.

ADD REPLY

Login before adding your answer.

Traffic: 2905 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6