How to prepare a table of last exons
1
0
Entering edit mode
7.1 years ago

I downloaded a bed file of coding exons from NCBI Table Browser. chr1 67000041 67000051 NM_001308203_cds_1_0_chr1_67000042_f 0 + chr1 67091529 67091593 NM_001308203_cds_2_0_chr1_67091530_f 0 + chr1 67098752 67098777 NM_001308203_cds_3_0_chr1_67098753_f 0 + chr1 67105459 67105516 NM_001308203_cds_4_0_chr1_67105460_f 0 + chr1 67108492 67108547 NM_001308203_cds_5_0_chr1_67108493_f 0 + chr1 67109226 67109402 NM_001308203_cds_6_0_chr1_67109227_f 0 + chr1 67136677 67136702 NM_001308203_cds_7_0_chr1_67136678_f 0 + chr1 67137626 67137678 NM_001308203_cds_8_0_chr1_67137627_f 0 + chr1 67138963 67139049 NM_001308203_cds_9_0_chr1_67138964_f 0 + chr1 67142686 67142779 NM_001308203_cds_10_0_chr1_67142687_f 0 + chr1 67145360 67145435 NM_001308203_cds_11_0_chr1_67145361_f 0 + chr1 67154830 67154958 NM_001308203_cds_12_0_chr1_67154831_f 0 + chr1 67155872 67155999 NM_001308203_cds_13_0_chr1_67155873_f 0 + chr1 67160121 67160187 NM_001308203_cds_14_0_chr1_67160122_f 0 + chr1 67184976 67185088 NM_001308203_cds_15_0_chr1_67184977_f 0 + chr1 67194946 67195102 NM_001308203_cds_16_0_chr1_67194947_f 0 + chr1 67199430 67199563 NM_001308203_cds_17_0_chr1_67199431_f 0 + chr1 67205017 67205220 NM_001308203_cds_18_0_chr1_67205018_f 0 + chr1 67206340 67206405 NM_001308203_cds_19_0_chr1_67206341_f 0 + chr1 67206954 67207119 NM_001308203_cds_20_0_chr1_67206955_f 0 + chr1 67208755 67208778 NM_001308203_cds_21_0_chr1_67208756_f 0 + chr1 67000041 67000051 NM_032291_cds_0_0_chr1_67000042_f 0 + chr1 67091529 67091593 NM_032291_cds_1_0_chr1_67091530_f 0 + chr1 67098752 67098777 NM_032291_cds_2_0_chr1_67098753_f 0 + chr1 67101626 67101698 NM_032291_cds_3_0_chr1_67101627_f 0 + chr1 67105459 67105516 NM_032291_cds_4_0_chr1_67105460_f 0 + chr1 67108492 67108547 NM_032291_cds_5_0_chr1_67108493_f 0 + chr1 67109226 67109402 NM_032291_cds_6_0_chr1_67109227_f 0 + chr1 67126195 67126207 NM_032291_cds_7_0_chr1_67126196_f 0 +

What I am interested in is a bed file containing only the last exon for each gene along with the gene identifier information. For example: chr1 67208755 67208778 NM_001308203 0 +

chr1 67126195 67126207 NM_032291 0 +

I'm pretty inexperienced with awk, but I'm hoping there is a straightforward command to do this

Thanks in advance,

Lauren

RNA-Seq awk R • 969 views
ADD COMMENT
0
Entering edit mode
7.1 years ago

for all '+' exons;

grep '+$' intput.tsv | sed 's/_cds/\t/' | sort -t $'\t' -k4,4 -k1,1 -k2,2rn | cut -f 1-4 | sort -t $'\t' -k4,4 -k1,1 --stable --uniq

for '-' exons:

grep -- '-$' intput.tsv | sed 's/_cds/\t/' | sort -t $'\t' -k4,4 -k1,1 -k2,2n | cut -f 1-4 | sort -t $'\t' -k4,4 -k1,1 --stable --uniq

(very similar to how to remove rows based on certain characters )

ADD COMMENT

Login before adding your answer.

Traffic: 2704 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6