Entering edit mode
8.8 years ago
c1toscano
•
0
Hi, I wonder how to download the coordinates of all the stop codons on human genome. Thanks!
Hi, I wonder how to download the coordinates of all the stop codons on human genome. Thanks!
As a BED file, using UCSC known genes:
curl -s "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/knownGene.txt.gz" | gunzip -c | awk -F '\t' '{if(int($6)<int($7)) printf("%s\t%d\t%d\n",$2,($3 == "+"?int($7)-3:int($6)+0),($3=="+"?int($7):int($6)+2));}' | sort | uniq
chr10 100003912 100003915
chr10 100008677 100008679
chr10 100143554 100143556
chr10 100154948 100154950
chr10 100177320 100177322
chr10 100183361 100183363
chr10 100186935 100186937
chr10 100189291 100189293
chr10 100219330 100219332
chr10 100221542 100221544
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Surprisingly, this isn't available via biomart. You can, however, just grep "stop_codon" from the GTF file if you download that from Ensembl or Gencode.
BTW, I'm posting this as a comment to not dissuade someone from posting the mysql/UCSC one liner.
We all know who the "someone" is :-)