Question: filtered tab delimited file with awk
0
gravatar for yaghoub.amraei
7 weeks ago by
yaghoub.amraei10 wrote:

Hello everyone ... I have the output of an assembly with cufflinks and I want to delete transcripts that have FPKM < 0.5 ... How can I do this using awk .. Thanks for your help.

1   Cufflinks   transcript  11218   12435   1   +   .   "gene_id ""Os01g0100200""; transcript_id ""Os01t0100200-01""; FPKM ""0.0000000000""; frac ""0.000000""; conf_lo ""0.000000""; conf_hi ""0.129298""; cov ""0.000000""; full_read_support ""no"";"
1   Cufflinks   exon    11218   12060   1   +   .   "gene_id ""Os01g0100200""; transcript_id ""Os01t0100200-01""; exon_number ""1""; FPKM ""0.0000000000""; frac ""0.000000""; conf_lo ""0.000000""; conf_hi ""0.129298""; cov ""0.000000"";"
1   Cufflinks   exon    12152   12435   1   +   .   "gene_id ""Os01g0100200""; transcript_id ""Os01t0100200-01""; exon_number ""2""; FPKM ""0.0000000000""; frac ""0.000000""; conf_lo ""0.000000""; conf_hi ""0.129298""; cov ""0.000000"";"
1   Cufflinks   transcript  11372   12284   1000    -   .   "gene_id ""Os01g0100300""; transcript_id ""Os01t0100300-00""; FPKM ""0.1303951660""; frac ""1.000000""; conf_lo ""0.000000""; conf_hi ""0.502776""; cov ""0.268453""; full_read_support ""no"";"
1   Cufflinks   exon    11372   12042   1000    -   .   "gene_id ""Os01g0100300""; transcript_id ""Os01t0100300-00""; exon_number ""1""; FPKM ""0.1303951660""; frac ""1.000000""; conf_lo ""0.000000""; conf_hi ""0.502776""; cov ""0.268453"";"
1   Cufflinks   exon    12146   12284   1000    -   .   "gene_id ""Os01g0100300""; transcript_id ""Os01t0100300-00""; exon_number ""2""; FPKM ""0.1303951660""; frac ""1.000000""; conf_lo ""0.000000""; conf_hi ""0.502776""; cov ""0.268453"";"
1   Cufflinks   transcript  12721   15685   1000    +   .   "gene_id ""Os01g0100400""; transcript_id ""Os01t0100400-01""; FPKM ""6.3114227816""; frac ""0.716157""; conf_lo ""5.284882""; conf_hi ""7.337964""; cov ""25.312427""; full_read_support ""yes"";"
1   Cufflinks   exon    12721   13813   1000    +   .   "gene_id ""Os01g0100400""; transcript_id ""Os01t0100400-01""; exon_number ""1""; FPKM ""6.3114227816""; frac ""0.716157""; conf_lo ""5.284882""; conf_hi ""7.337964""; cov ""25.312427"";"
1   Cufflinks   exon    13906   14271   1000    +   .   "gene_id ""Os01g0100400""; transcript_id ""Os01t0100400-01""; exon_number ""2""; FPKM ""6.3114227816""; frac ""0.716157""; conf_lo ""5.284882""; conf_hi ""7.337964""; cov ""25.312427"";"
1   Cufflinks   exon    14359   14437   1000    +   .   "gene_id ""Os01g0100400""; transcript_id ""Os01t0100400-01""; exon_number ""3""; FPKM ""6.3114227816""; frac ""0.716157""; conf_lo ""5.284882""; conf_hi ""7.337964""; cov ""25.312427"";"
1   Cufflinks   exon    14969   15171   1000    +   .   "gene_id ""Os01g0100400""; transcript_id ""Os01t0100400-01""; exon_number ""4""; FPKM ""6.3114227816""; frac ""0.716157""; conf_lo ""5.284882""; conf_hi ""7.337964""; cov ""25.312427"";"
1   Cufflinks   exon    15266   15685   1000    +   .   "gene_id ""Os01g0100400""; transcript_id ""Os01t0100400-01""; exon_number ""5""; FPKM ""6.3114227816""; frac ""0.716157""; conf_lo ""5.284882""; conf_hi ""7.337964""; cov ""25.312427"";"
1   Cufflinks   transcript  12808   13978   1000    -   .   "gene_id ""Os01g0100466""; transcript_id ""Os01t0100466-00""; FPKM ""5.8063471120""; frac ""0.283843""; conf_lo ""4.329164""; conf_hi ""7.283530""; cov ""23.286784""; full_read_support ""no"";"
1   Cufflinks   exon    12808   13782   1000    -   .   "gene_id ""Os01g0100466""; transcript_id ""Os01t0100466-00""; exon_number ""1""; FPKM ""5.8063471120""; frac ""0.283843""; conf_lo ""4.329164""; conf_hi ""7.283530""; cov ""23.286784"";"
1   Cufflinks   exon    13880   13978   1000    -   .   "gene_id ""Os01g0100466""; transcript_id ""Os01t0100466-00""; exon_number ""2""; FPKM ""5.8063471120""; frac ""0.283843""; conf_lo ""4.329164""; conf_hi ""7.283530""; cov ""23.286784"";"
1   Cufflinks   transcript  2905    10815   1000    +   .   "gene_id ""CUFF.1""; transcript_id ""CUFF.1.1""; FPKM ""4.7439672851""; frac ""0.518769""; conf_lo ""3.876114""; conf_hi ""5.611820""; cov ""18.968843""; full_read_support ""yes"";"
1   Cufflinks   exon    2905    3255    1000    +   .   "gene_id ""CUFF.1""; transcript_id ""CUFF.1.1""; exon_number ""1""; FPKM ""4.7439672851""; frac ""0.518769""; conf_lo ""3.876114""; conf_hi ""5.611820""; cov ""18.968843"";"
file filtering text assembly • 232 views
ADD COMMENTlink modified 7 weeks ago by Pierre Lindenbaum130k • written 7 weeks ago by yaghoub.amraei10

I guess FPKM is the 13th column. if so, you can print transcripts that have FPKM values greater than 0.5 as below.

 cat yourfile | awk '{if($13>=0.5) print}' > output
ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by Mehmet600

A couple of questions:

  1. This looks like a GTF-ish format. 9 columns are tab delimited and the 9th column is a ; delimited, " " separated key-value pair. How does your awk account for that?
  2. Even if everything were tab delimited, there is no way FPKM is column #13. Even if so, it is quoted (doubly for some odd reason). How do you directly compare that to a number?
ADD REPLYlink written 7 weeks ago by RamRS30k
3
gravatar for Pierre Lindenbaum
7 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum130k wrote:
awk -F '[ "\t]+' '{for(i=9;i+1<=NF;i++) if($i=="FPKM" && $(i+1)>=0.5) {print;break;}}' yourfile
ADD COMMENTlink modified 7 weeks ago • written 7 weeks ago by Pierre Lindenbaum130k
2

amazing, as usual.

ADD REPLYlink written 7 weeks ago by geek_y11k

It was great .. Thank you

ADD REPLYlink written 6 weeks ago by yaghoub.amraei10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1632 users visited in the last hour