Using awk to filter .gtf by FPKM
1
0
Entering edit mode
3.9 years ago

Hi,

I'm sorry if this is incredibly basic but I'm very new to working in terminal. I have a .gtf that I would like to filter based on an FPKM threshold. However, the column that contains FPKM values has these values in equations, so when I try with

awk '{if($16>0.3) print }' < file.gtf > new.file.gtf I get a blank file as output.

Is there a way around this? I've tried using quotations in my filtering step too, just in case awk '{if($16>"0.3") print but still get an empty file. Any tips would be greatly appreciated!!

RNA-Seq assembly • 1.5k views
ADD COMMENT
0
Entering edit mode

Hi, you will definitely need to show some sample input and desired output. Currently, we cannot see how your data is arranged.

ADD REPLY
0
Entering edit mode

Thanks for letting me know Kevin!

ADD REPLY
0
Entering edit mode

Thanks for letting me know Kevin!

ADD REPLY
0
Entering edit mode

Here's an example of my data

chr1    StringTie   exon    3011039 3011386 1000    -   .   gene_id "STRG.2"; transcript_id "STRG.2.1"; exon_number "1"; cov "2.586207";

chr1    StringTie   transcript  3026805 3027018 1000    -   .   gene_id "STRG.3"; transcript_id "STRG.3.1"; cov "2.803738"; FPKM "0.239292"; TPM "0.462935";

chr1    StringTie   exon    3026805 3027018 1000    -   .   gene_id "STRG.3"; transcript_id "STRG.3.1"; exon_number "1"; cov "2.803738";

chr1    StringTie   transcript  3047584 3047862 1000    -   .   gene_id "STRG.4"; transcript_id "STRG.4.1"; cov "3.225806"; FPKM "0.275315"; TPM "0.532624";

chr1    StringTie   exon    3047584 3047862 1000    -   .   gene_id "STRG.4"; transcript_id "STRG.4.1"; exon_number "1"; cov "3.225806";

chr1    StringTie   transcript  3057653 3057894 1000    +   .   gene_id "STRG.5"; transcript_id "STRG.5.1"; cov "2.789256"; FPKM "0.238056"; TPM "0.460544";

chr1    StringTie   exon    3057653 3057894 1000    +   .   gene_id "STRG.5"; transcript_id "STRG.5.1"; exon_number "1"; cov "2.789256";

chr1    StringTie   transcript  3072676 3072886 1000    +   .   gene_id "STRG.6"; transcript_id "STRG.6.1"; cov "3.317536"; FPKM "0.283144"; TPM "0.547770";

chr1    StringTie   exon    3072676 3072886 1000    +   .   gene_id "STRG.6"; transcript_id "STRG.6.1"; exon_number "1"; cov "3.317536";

chr1    StringTie   transcript  3080222 3080714 1000    -   .   gene_id "STRG.7"; transcript_id "STRG.7.1"; cov "2.835700"; FPKM "0.242020"; TPM "0.468212";
ADD REPLY
0
Entering edit mode

If you just want an expression matrix, I would try Ballgown: A: Generating FPKM matrix accross all samples after stringtie

ADD REPLY
1
Entering edit mode
3.9 years ago

with input text (cut off is 0.25):

$ awk -F '"' '/FPKM/ { if ($8 > 0.25) print}' test.txt 
chr1    StringTie   transcript  3047584 3047862 1000    -   .   gene_id "STRG.4"; transcript_id "STRG.4.1"; cov "3.225806"; FPKM "0.275315"; TPM "0.532624";
chr1    StringTie   transcript  3072676 3072886 1000    +   .   gene_id "STRG.6"; transcript_id "STRG.6.1"; cov "3.317536"; FPKM "0.283144"; TPM "0.547770";
ADD COMMENT

Login before adding your answer.

Traffic: 2177 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6