Extraction of information from stringtie output for selected gene ids
0
0
Entering edit mode
3.2 years ago
kuntalasb ▴ 10

I have stringtie abundance estimation output (.gtf and .tab) files for a set of genes and the gtf looks something like this:

MKYO02000001.1  StringTie   transcript  29543   31126   1000    -   .   gene_id "XLOC_000054"; transcript_id "TCONS_00000371"; cov "0.744585"; FPKM "0.133896"; TPM "0.234692";
MKYO02000001.1  StringTie   exon    29543   30439   1000    -   .   gene_id "XLOC_000054"; transcript_id "TCONS_00000371"; exon_number "1"; cov "0.716216";
MKYO02000001.1  StringTie   exon    30816   31126   1000    -   .   gene_id "XLOC_000054"; transcript_id "TCONS_00000371"; exon_number "2"; cov "0.826408";
MKYO02000001.1  StringTie   transcript  29543   30277   1000    -   .   gene_id "XLOC_000054"; transcript_id "TCONS_00001565"; cov "0.037965"; FPKM "0.006827"; TPM "0.011966";
MKYO02000001.1  StringTie   exon    29543   30277   1000    -   .   gene_id "XLOC_000054"; transcript_id "TCONS_00001565"; exon_number "1"; cov "0.037965";
MKYO02000001.1  StringTie   transcript  29547   33508   1000    -   .   gene_id "XLOC_000054"; transcript_id "TCONS_00000108"; cov "1.023863"; FPKM "0.184118"; TPM "0.322719";
MKYO02000001.1  StringTie   exon    29547   30439   1000    -   .   gene_id "XLOC_000054"; transcript_id "TCONS_00000108"; exon_number "1"; cov "1.049466";
MKYO02000001.1  StringTie   exon    30816   31078   1000    -   .   gene_id "XLOC_000054"; transcript_id "TCONS_00000108"; exon_number "2"; cov "1.306955";
MKYO02000001.1  StringTie   exon    31235   31320   1000    -   .   gene_id "XLOC_000054"; transcript_id "TCONS_00000108"; exon_number "3"; cov "1.382869";

How to I extract information for some selected transcript ids from this file? Since it will be impossible to manually extract 10000 transcripts information from a file of 30000 transcripts, any command based solution would be very helpful. However, I have a text file with all the selected transcript ids and it looks like this: TCONS_00000101 TCONS_00000102 TCONS_00000103 . . .

Is there any grep/cut command for extracting information for my required ids through the text file? Any response would be of great help. Thank you.

RNA-Seq next-gen gene • 823 views
ADD COMMENT
1
Entering edit mode

What if you just

grep -f file_TCON_ids.txt stringtie.gtf

That should give you all the info about your TCONS of interest.

ADD REPLY
0
Entering edit mode

Thank you so much. It worked for me!

ADD REPLY

Login before adding your answer.

Traffic: 1866 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6