How do I remove unwanted line from GFF3 file?
1
0
Entering edit mode
4.3 years ago
eennadi ▴ 30

Please I would like to find out how to remove "-T" from the name in my GFF3 file

i have pasted a sample

For example I will like the ID ID=C1_00010W_A-T to become ID=C1_00010W_A in otherwords the ID section to match the Parent section

CaO19.6115,IPF21113.1,IPF27828.1,orf19.13534,orf19.6115
Ca22chr1A_C_albicans_SC5314 CGD mRNA    4059    4397    .   +   .   ID=C1_00010W_A-T;Parent=C1_00010W_A;Name=C1_00010W_A;Note=%28orf19.6115%29%20Dubious%20open%20reading%20frame;orf_classification=Dubious;Alias=C1_00010W,C1_00010W_B,CaO19.11880,CaO19.13534,CaO19.4402,CaO19.6115,IPF21113.1,IPF27828.1,orf19.13534,orf19.6115
Ca22chr1A_C_albicans_SC5314 CGD exon    4059    4397    .   +   .   ID=C1_00010W_A-T-E1;Parent=C1_00010W_A-T
Ca22chr1A_C_albicans_SC5314 CGD CDS 4059    4397    .   +   0   ID=C1_00010W_A-P;Parent=C1_00010W_A-T;orf_classification=Dubious;parent_feature_type=ORF
Ca22chr1B_C_albicans_SC5314 CGD gene    4059    4397    .   +   .   ID=C1_00010W_B;Name=C1_00010W_B;Note=%28orf19.6115%29%20Dubious%20open%20reading%20frame;orf_classification=Dubious
Ca22chr1B_C_albicans_SC5314 CGD mRNA    4059    4397    .   +   .   ID=C1_00010W_B-T;Parent=C1_00010W_B;Name=C1_00010W_B;Note=%28orf19.6115%29%20Dubious%20open%20reading%20frame;orf_classification=Dubious
Ca22chr1B_C_albicans_SC5314 CGD exon    4059    4397    .   +   .   ID=C1_00010W_B-T-E1;Parent=C1_00010W_B-T
Ca22chr1A_C_albicans_SC5314 CGD mRNA    4409    4720    .   -   .   ID=C1_00020C_A-T;Parent=C1_00020C_A;Name=C1_00020C_A;Note=%28orf19.6114%29%20Protein%20of%20unknown%20function%3B%20transcript%20detected%20on%20high-resolution%20tiling%20arrays;orf_classification=Uncharacterized;Alias=C1_00020C,C1_00020C_B,CAWG_03102,CaO19.13533,CaO19.6114,IPF21135.1,IPF27840.1,orf19.13533,orf19.6114,orf6.6227
Ca22chr1A_C_albicans_SC5314 CGD exon    4409    4720    .   -   .   ID=C1_00020C_A-T-E1;Parent=C1_00020C_A-T
Ca22chr1A_C_albicans_SC5314 CGD CDS 4409    4720    .   -   0   ID=C1_00020C_A-P;Parent=C1_00020C_A-T;orf_classification=Uncharacterized;parent_feature_type=ORF
Ca22chr1B_C_albicans_SC5314 CGD gene    4409    4720    .   -   .   ID=C1_00020C_B;Name=C1_00020C_B;Note=%28orf19.6114%29%20Protein%20of%20unknown%20function%3B%20transcript%20detected%20on%20high-resolution%20tiling%20arrays;orf_classification=Uncharacterized
Ca22chr1B_C_albicans_SC5314 CGD mRNA    4409    4720    .   -   .   ID=C1_00020C_B-T;Parent=C1_00020C_B;Name=C1_00020C_B;Note=%28orf19.6114%29%20Protein%20of%20unknown%20function%3B%20transcript%20detected%20on%20high-resolution%20tiling%20arrays;orf_classification=Uncharacterized
Ca22chr1B_C_albicans_SC5314 CGD exon    4409    4720    .   -   .   ID=C1_00020C_B-T-E1;Parent=C1_00020C_B-T
Ca22chr1B_C_albicans_SC5314 CGD CDS 4409    4720    .   -   0   ID=C1_00020C_B-P;Parent=C1_00020C_B-T;orf_classification=Uncharacterized;parent_feature_type=ORF
Ca22chr1A_C_albicans_SC5314 CGD gene    8597    8908
genome • 1.5k views
ADD COMMENT
1
Entering edit mode

I'm not sure that's valid according to the GFF3 specs? (ID should be unique ?)

ADD REPLY
0
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

ADD REPLY
0
Entering edit mode
4.3 years ago
mito ▴ 10
sed 's:\([ACGT]\)-[ACGT]:\1:g' my_file.gff3

This searches for all occurances of nucleotide1-nucleotide2 and reduces it to nucleotide1

You can use the -i flag for sed to do the replacement in-place.

edit: It appears that the characters that you want to remove are not always nucleotide characters. But it seems that they are always upper-case. The following replaces all occurances of upper_case1-upper_case2 with upper_case1:

sed 's/\([[:upper:]]\)-[[:upper:]]/\1/g' my_file.gff3
ADD COMMENT
0
Entering edit mode

it's not referring to nucleotides but to gene/transcript names (eg the second one is _B-T) which would already not fit your regex.

AND: always be careful when using the -i flag !

ADD REPLY

Login before adding your answer.

Traffic: 2236 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6