Question: Add something to change the transcripts ID
0
gravatar for Alex
2.8 years ago by
Alex30
America
Alex30 wrote:

Dear Biostars friends, I am learning the programming and encountered some problems while I can't solve it now I want to add _1,_2,_3... to the transcripts ID with the same gene,my original file like this :

scaffold_1 transcript 55098 57492 . + . gene_id "Seita.1G000200"; transcript_id "Seita.1G000200"

scaffold_1 exon 55098 55372 . + . gene_id "Seita.1G000200"; transcript_id "Seita.1G000200"

scaffold_1 transcript 55102 57490 . + . gene_id "Seita.1G000200"; transcript_id "Seita.1G000200"

scaffold_1 exon 55102 55372 . + . gene_id "Seita.1G000200"; transcript_id "Seita.1G000200"

scaffold_1 transcript 55102 57480 . + . gene_id "Seita.1G000200"; transcript_id "Seita.1G000200"

scaffold_1 exon 55102 55372 . + . gene_id "Seita.1G000200"; transcript_id "Seita.1G000200"

scaffold_1 transcript 75108 76843 . + . gene_id "Seita.1G000300"; transcript_id "Seita.1G000300"

scaffold_1 exon 75108 76406 . + . gene_id "Seita.1G000300"; transcript_id "Seita.1G000300"

while the the target file like this:

scaffold_1 transcript 55098 57492 . + . gene_id "Seita.1G000200"; transcript_id "Seita.1G000200_1"

scaffold_1 exon 55098 55372 . + . gene_id "Seita.1G000200"; transcript_id "Seita.1G000200_1"

scaffold_1 transcript 55102 57490 . + . gene_id "Seita.1G000200"; transcript_id "Seita.1G000200_2"

scaffold_1 exon 55102 55372 . + . gene_id "Seita.1G000200"; transcript_id "Seita.1G000200_2"

scaffold_1 transcript 55102 57480 . + . gene_id "Seita.1G000200"; transcript_id "Seita.1G000200_3"

scaffold_1 exon 55102 55372 . + . gene_id "Seita.1G000200"; transcript_id "Seita.1G000200_3"

scaffold_1 transcript 75108 76843 . + . gene_id "Seita.1G000300"; transcript_id "Seita.1G000300_1"

scaffold_1 exon 75108 76406 . + . gene_id "Seita.1G000300"; transcript_id "Seita.1G000300_1"

Thanks for the help

sequence gene • 774 views
ADD COMMENTlink written 2.8 years ago by Alex30

Two remarks:

  1. When you post some tabular file content like in this case, wrap it with the "code sample" option. It's the 5th button from the left of your message editor panel.

  2. This is actually an easy task if you know a little bit of scripting. I would suggest you to learn some python, perl or bash to achieve this result quickly. Creating a dictionary with genes would help you, or a list of tuples.

If you don't want to, you can use a counter that starts from 1 and adds up as long as the "gene_id" field is the same as the line before. This requires your file to be sorted.

ADD REPLYlink written 2.8 years ago by Macspider3.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 916 users visited in the last hour