How to get only the first introns in bed file format from UCSC Gene data?
1
0
Entering edit mode
6.3 years ago

Hi,

I am interested in learning more about the first introns of UCSC gene or RefSeq gene in the UCSC genome browser. However, there doesn't seem to be a way to extract only the first intron data in bed file format. I understand that I can get exons or introns using table browser but it does not seem to have an option to only look at all the first introns of the genes. Has anyone figured out a way to approach it? Thanks a lot.

Best,

genome • 1.4k views
ADD COMMENT
2
Entering edit mode
6.3 years ago

if I'm not wrong, this should do the job:

wget -O - -q "http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/refGene.txt.gz" | \
gunzip -c |\
awk -F '\t' '{nEx=int($9);if(nEx==1)next;split($10,S,/,/);split($11,E,/,/);printf("%s\t%d\t%d\t%s\n",$2,($4=="-"?E[nEx-1]:E[1]),($4=="-"?S[nEx]-1:S[2]),$2);}'
ADD COMMENT
0
Entering edit mode

Hi Pirerre,

That totally makes sense. Thank you so much.

Best

ADD REPLY
0
Entering edit mode

if it works for you, close+validate this post by clicking the green mark on the left.

ADD REPLY

Login before adding your answer.

Traffic: 2567 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6