Entering edit mode
                    13.8 years ago
        Anima Mundi
        
    
        ★
    
    2.9k
    Hello,
I would to know how to download the FASTAs of all the introns from a given UCSC genomic assembly.
Hello,
I would to know how to download the FASTAs of all the introns from a given UCSC genomic assembly.
The following command line prints all the introns of the ucsc/knownGene table.
$ curl -s "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/knownGene.txt.gz" |\
gunzip -c  |\
awk -F '   ' '{ exonCount=int($8);split($9,exonStarts,"[,]"); split($10,exonEnds,"[,]"); for(i=1;i<exonCount;i++) {printf("%s\t%s\t%s\t%s\t%s\tIntron_%d\n",$1,$2,$3,exonEnds[i],exonStarts[i+1],($3=="+"?i:exonCount-i));}}'
uc001aaa.3    chr1    +    12227    12612    Intron_1
uc001aaa.3    chr1    +    12721    13220    Intron_2
uc010nxq.1    chr1    +    12227    12594    Intron_1
uc010nxq.1    chr1    +    12721    13402    Intron_2
uc010nxr.1    chr1    +    12227    12645    Intron_1
uc010nxr.1    chr1    +    12697    13220    Intron_2
uc009vis.2    chr1    -    14829    14969    Intron_3
uc009vis.2    chr1    -    15038    15795    Intron_2
uc009vis.2    chr1    -    15942    16606    Intron_1
uc009vit.2    chr1    -    14829    14969    Intron_8
(...)
                    
                
                UCSC table browser, select the appropriate track (e.g. some gene annotation), select output format "sequence", click "get output", select "genomic", there will be options to download "introns".
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Very useful answer, thanks. I accepted Wen's solution because it is "ready to go".