Question: (Closed) gff file name replace!
0
gravatar for fufuyou
3.8 years ago by
fufuyou110
United States
fufuyou110 wrote:

Hi, This is my gff file. I want to change some names. I tried to write a code, but it does not work well. The file is :

Contig11434     EVM     gene    148504  151454  .       +       .       ID=SGH012586-RA;Name=Contig11434.9;
Contig11434     EVM     mRNA    148504  151454  .       +       .       ID=SGH012586-RA;Parent=SGH012586-RA;Name=Contig11434.9;
Contig11434     EVM     exon    148504  148731  .       +       .       ID=Contig11434.9.1;Parent=SGH012586-RA;
Contig11434     EVM     CDS     148504  148731  .       +       0       ID=SGH012586-RA;Parent=SGH012586-RA;
Contig11434     EVM     exon    149332  149538  .       +       .       ID=Contig11434.9.2;Parent=SGH012586-RA;

I want to get the modified file is:

Contig11434     EVM     gene    148504  151454  .       +       .       ID=SGH012586-RA;Name=SGH012586-RA;
Contig11434     EVM     mRNA    148504  151454  .       +       .       ID=SGH012586-RA;Parent=SGH012586-RA;Name=SGH012586-RA;
Contig11434     EVM     exon    148504  148731  .       +       .       ID=SGH012586-RA.1;Parent=SGH012586-RA;
Contig11434     EVM     CDS     148504  148731  .       +       0       ID=SGH012586-RA;Parent=SGH012586-RA;
Contig11434     EVM     exon    149332  149538  .       +       .       ID=SGH012586-RA.2;Parent=SGH012586-RA;
assembly • 1.8k views
ADD COMMENTlink modified 3.7 years ago by Devon Ryan97k • written 3.8 years ago by fufuyou110
1

Could you please format your text?

Only to clarify before writing the answer, do you want to change Name=Contig11434.9 to Name=SGH012586-RA for genes and mRNAs and that is it? Anything else?

(changes to gff file might affect the way other programs interpret It)

ADD REPLYlink written 3.8 years ago by Petr Ponomarenko2.6k

Hi Petr, Thanks. My gff file is: Contig11434 EVM gene 1449 5723 . - . ID=SGH012578-RA;Name=Contig11434.1; Contig11434 EVM mRNA 1449 5723 . - . ID=SGH012578-RA;Parent=SGH012578-RA;Name=Contig11434.1; Contig11434 EVM exon 1449 5723 . - . ID=Contig11434.1.1;Parent=SGH012578-RA; Contig11434 EVM CDS 1449 5723 . - 0 ID=SGH012578-RA;Parent=SGH012578-RA;

Contig11434 EVM gene 9081 10379 . - . ID=SGH012579-RA;Name=Contig11434.2; Contig11434 EVM mRNA 9081 10379 . - . ID=SGH012579-RA;Parent=SGH012579-RA;Name=Contig11434.2; Contig11434 EVM exon 9081 10379 . - . ID=Contig11434.2.1;Parent=SGH012579-RA; Contig11434 EVM CDS 9081 10379 . - 0 ID=SGH012579-RA;Parent=SGH012579-RA; I want to change as: Contig11434 EVM gene 1449 5723 . - . ID=SGH012578-RA;Name=SGH012578-RA; Contig11434 EVM mRNA 1449 5723 . - . ID=SGH012578-RA;Parent=SGH012578-RA;Name=SGH012578-RA; Contig11434 EVM exon 1449 5723 . - . ID=SGH012578-RA.1;Parent=SGH012578-RA; Contig11434 EVM CDS 1449 5723 . - 0 ID=SGH012578-RA;Parent=SGH012578-RA;

Contig11434 EVM gene 9081 10379 . - . ID=SGH012579-RA;Name=SGH012578-RA; Contig11434 EVM mRNA 9081 10379 . - . ID=SGH012579-RA;Parent=SGH012579-RA;Name=SGH012578-RA; Contig11434 EVM exon 9081 10379 . - . ID=SGH012578-RA.1;Parent=SGH012579-RA; Contig11434 EVM CDS 9081 10379 . - 0 ID=SGH012579-RA;Parent=SGH012579-RA; I have replace some ID. But my code can not replace all feature.

ADD REPLYlink written 3.8 years ago by fufuyou110

Hi Petr, I do not know why my file is not formation file. I submitted the same file before. The file should be formatted. Fuyou

ADD REPLYlink written 3.8 years ago by fufuyou110
1

There are buttons on top of the area you are typing your answer in. These are for formatting. One of the buttons has 1s and 0s in it. This is to provide samples of the code (and is good to show stdout output to the terminal).

Here is your text formatted

"Hi Petr, Thanks. My gff file is:

Contig11434 EVM gene 1449 5723 . - . ID=SGH012578-RA;Name=Contig11434.1; 
Contig11434 EVM mRNA 1449 5723 . - . ID=SGH012578-RA;Parent=SGH012578-RA;Name=Contig11434.1; 
Contig11434 EVM exon 1449 5723 . - . ID=Contig11434.1.1;Parent=SGH012578-RA; 
Contig11434 EVM CDS 1449 5723 . - 0 ID=SGH012578-RA;Parent=SGH012578-RA;
Contig11434 EVM gene 9081 10379 . - . ID=SGH012579-RA;Name=Contig11434.2; 
Contig11434 EVM mRNA 9081 10379 . - . ID=SGH012579-RA;Parent=SGH012579-RA;Name=Contig11434.2;
Contig11434 EVM exon 9081 10379 . - . ID=Contig11434.2.1;Parent=SGH012579-RA; 
Contig11434 EVM CDS 9081 10379 . - 0 ID=SGH012579-RA;Parent=SGH012579-RA;

I want to change as:

Contig11434 EVM gene 1449 5723 . - . ID=SGH012578-RA;Name=SGH012578-RA; 
Contig11434 EVM mRNA 1449 5723 . - . ID=SGH012578-RA;Parent=SGH012578-RA;Name=SGH012578-RA;
Contig11434 EVM exon 1449 5723 . - . ID=SGH012578-RA.1;Parent=SGH012578-RA;
Contig11434 EVM CDS 1449 5723 . - 0 ID=SGH012578-RA;Parent=SGH012578-RA;
Contig11434 EVM gene 9081 10379 . - . ID=SGH012579-RA;Name=SGH012578-RA; 
Contig11434 EVM mRNA 9081 10379 . - . ID=SGH012579-RA;Parent=SGH012579-RA;Name=SGH012578-RA;
Contig11434 EVM exon 9081 10379 . - . ID=SGH012578-RA.1;Parent=SGH012579-RA; 
Contig11434 EVM CDS 9081 10379 . - 0 ID=SGH012579-RA;Parent=SGH012579-RA;

I have replace some ID. But my code can not replace all feature."

Now my question is what exactly you want to do? I do not understand your desired output. It is confusing to me. Why do you have ID=SGH012578-RA.1 for exons? Why do you have ID=SGH012579-RA;Name=SGH012578-RA; for the second gene in your snippet?

ADD REPLYlink written 3.8 years ago by Petr Ponomarenko2.6k

The example is invalid because genes, mrna, and cds features have the same ID, e.g. SGH012578-RA, which is not allowed. The best way of dealing with this situation is during generating the original GFF file, and to write correct GFF3 format right away. It is not given that a parser will work with invalid ids. I understand that you wish to set Name := ID for all features, but that might make things worse if the input is invalid already.

  • How was the input file generated, which software was used?
  • If the file was given to you, ask the person for a valid format.
ADD REPLYlink written 3.7 years ago by Michael Dondrup48k

Hi fufuyou, it is unclear what you are asking, because neither the input nor desired output example are valid GFF. I would therefore like to put this question on hold. To fix this please provide a part of your real input file as example and explain exactly how and why you want to change each identifier.

ADD REPLYlink written 3.7 years ago by Michael Dondrup48k

Hello fufuyou!

We believe that this post does not fit the main topic of this site.

See my comment above... It is unclear what is being asked.

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLYlink written 3.7 years ago by Michael Dondrup48k
Please log in to add an answer.
The thread is closed. No new answers may be added.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1657 users visited in the last hour