Extract specific gene id from an annotation file
0
0
Entering edit mode
3.5 years ago

The following is the content of a text file I want to write a small script, if I give input the common name of it will give the output which is also gene name but which is associated gene id of the same line. For reference gaps are created in between two lines are to differentiate two different lines. In original file there is no gaps break, I provided the output and input scenario below on this page.

SDRB02000004.1 Genbank gene 6018 10396 . + . gene_id "TEA_012962"; transcript_id ""; gbkey "Gene"; gene_biotype "protein_coding"; locus_tag "TEA_012962";
SDRB02000004.1 Genbank transcript 6018 10396 . + . gene_id "TEA_012962"; transcript_id "gnl|WGS:SDRB|TEA014503.1"; gbkey "mRNA"; locus_tag "TEA_012962";orig_protein_id "gnl|WGS:SDRB|TEA014503.1:cds_7"; orig_transcript_id "gnl|WGS:SDRB|TEA014503.1"; product "hypothetical protein"; transcript_biotype "mRNA";
SDRB02000004.1 Genbank exon 6018 6864 . + . gene_id "TEA_012963"; transcript_id "gnl|WGS:SDRB|TEA014504.1"; locus_tag "TEA_012963"; orig_protein_id "gnl|WGS:SDRB|TEA014504.1:cds_7"; orig_transcript_id "gnl|WGS:SDRB|TEA014504.1"; product "hypothetical protein"; transcript_biotype "mRNA"; exon_number "1";
SDRB02000004.1 Genbank exon 7548 7685 . + . gene_id "TEA_012962"; transcript_id "gnl|WGS:SDRB|TEA014503.1"; locus_tag "TEA_012962"; orig_protein_id "gnl|WGS:SDRB|TEA014503.1:cds_7"; orig_transcript_id "gnl|WGS:SDRB|TEA014503.1"; product "hypothetical protein"; transcript_biotype "mRNA"; exon_number "2";
SDRB02000004.1 Genbank exon 7802 7923 . + . gene_id "TEA_012962"; transcript_id "gnl|WGS:SDRB|TEA014503.1"; locus_tag "TEA_012962"; orig_protein_id "gnl|WGS:SDRB|TEA014503.1:cds_7"; orig_transcript_id "gnl|WGS:SDRB|TEA014503.1"; product "hypothetical protein"; transcript_biotype "mRNA"; exon_number "3";
Input -  TEA_012962 TEA_012963 ...
output- TEA014503  TEA014504 ...
R RNA-seq Linux • 1.2k views
ADD COMMENT
0
Entering edit mode

For me it is really hard to understand the question, could you please elaborate on your input-output example?

ADD REPLY
0
Entering edit mode

Your query is little confusing but I feel like you are asking something like this:

grep -w -F -f genename1.txt yourfile.gtf > output.txt
ADD REPLY

Login before adding your answer.

Traffic: 2219 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6