Question: How to get only the first introns in bed file format from UCSC Gene data?
0
gravatar for beichenw909
7 months ago by
beichenw9090 wrote:

Hi,

I am interested in learning more about the first introns of UCSC gene or RefSeq gene in the UCSC genome browser. However, there doesn't seem to be a way to extract only the first intron data in bed file format. I understand that I can get exons or introns using table browser but it does not seem to have an option to only look at all the first introns of the genes. Has anyone figured out a way to approach it? Thanks a lot.

Best,

genome • 362 views
ADD COMMENTlink modified 7 months ago by Pierre Lindenbaum112k • written 7 months ago by beichenw9090
2
gravatar for Pierre Lindenbaum
7 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum112k wrote:

if I'm not wrong, this should do the job:

wget -O - -q "http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/refGene.txt.gz" | \
gunzip -c |\
awk -F '\t' '{nEx=int($9);if(nEx==1)next;split($10,S,/,/);split($11,E,/,/);printf("%s\t%d\t%d\t%s\n",$2,($4=="-"?E[nEx-1]:E[1]),($4=="-"?S[nEx]-1:S[2]),$2);}'
ADD COMMENTlink written 7 months ago by Pierre Lindenbaum112k

Hi Pirerre,

That totally makes sense. Thank you so much.

Best

ADD REPLYlink written 7 months ago by beichenw9090

if it works for you, close+validate this post by clicking the green mark on the left.

ADD REPLYlink written 7 months ago by Pierre Lindenbaum112k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1740 users visited in the last hour