Question: extract transcrips_id and gene_id from output cuffcampare
0
gravatar for yaghoub.amraei
5 months ago by
yaghoub.amraei10 wrote:

آHi all. I have a cuffcompare output and I want to extract the transcript_id andgene_id in column 9, which is a string, using grep or AWK. Thank you for your guidance

1   Cufflinks   exon    2899    3255    .   +   .   gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "1"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1";

1 Cufflinks exon 3354 3616 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "2"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 4357 4455 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "3"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 5457 5560 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "4"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 7136 7944 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "5"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 8028 8150 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "6"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 8408 8608 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "7"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 9210 9615 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "8"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 10102 10187 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "9"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 10274 10430 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "10"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 10504 10817 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "11"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1";

assembly • 210 views
ADD COMMENTlink modified 5 months ago • written 5 months ago by yaghoub.amraei10

1 Cufflinks exon 2899 3255 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "1"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 3354 3616 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "2"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 4357 4455 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "3"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 5457 5560 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "4"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 7136 7944 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "5"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 8028 8150 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "6"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 8408 8608 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "7"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 9210 9615 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "8"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 10102 10187 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "9"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 10274 10430 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "10"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1"; 1 Cufflinks exon 10504 10817 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "11"; gene_name "Os01g0100100"; oId "CUFF.7.2"; nearest_ref "Os01t0100100-01"; class_code "j"; tss_id "TSS1";enter code here

ADD REPLYlink written 5 months ago by yaghoub.amraei10

It's hard to tell the format of your file, but if it's in GTF format you can use this perl one liner.

perl -pe 's/.+gene_id\s\"(\w+)\".+transcript_id\s\"(\w+)\".+/$1\t$2/' file.txt
ADD REPLYlink modified 5 months ago • written 5 months ago by rpolicastro3.3k

hello. amazing, as usual.

ADD REPLYlink written 5 months ago by yaghoub.amraei10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1975 users visited in the last hour
_