Entering edit mode
12.9 years ago
Anima Mundi
★
2.9k
Hello,
I would to know how to download the FASTAs of all the introns from a given UCSC genomic assembly.
Hello,
I would to know how to download the FASTAs of all the introns from a given UCSC genomic assembly.
The following command line prints all the introns of the ucsc/knownGene table.
$ curl -s "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/knownGene.txt.gz" |\ gunzip -c |\ awk -F ' ' '{ exonCount=int($8);split($9,exonStarts,"[,]"); split($10,exonEnds,"[,]"); for(i=1;i<exonCount;i++) {printf("%s\t%s\t%s\t%s\t%s\tIntron_%d\n",$1,$2,$3,exonEnds[i],exonStarts[i+1],($3=="+"?i:exonCount-i));}}' uc001aaa.3 chr1 + 12227 12612 Intron_1 uc001aaa.3 chr1 + 12721 13220 Intron_2 uc010nxq.1 chr1 + 12227 12594 Intron_1 uc010nxq.1 chr1 + 12721 13402 Intron_2 uc010nxr.1 chr1 + 12227 12645 Intron_1 uc010nxr.1 chr1 + 12697 13220 Intron_2 uc009vis.2 chr1 - 14829 14969 Intron_3 uc009vis.2 chr1 - 15038 15795 Intron_2 uc009vis.2 chr1 - 15942 16606 Intron_1 uc009vit.2 chr1 - 14829 14969 Intron_8 (...)
UCSC table browser, select the appropriate track (e.g. some gene annotation), select output format "sequence", click "get output", select "genomic", there will be options to download "introns".
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Very useful answer, thanks. I accepted Wen's solution because it is "ready to go".