Entering edit mode
8.6 years ago
bharata1803
▴
560
Hello, I want to download 3 UTR human in FASTA format. I googled it before and come with 2 methods of download. 1. Easy Way To Get 3' Utr Lengths Of A List Of Genes 2. http://utrdb.ba.itb.cnr.it/home/download
I don't know why, I just failed. The Ensembl biomart somehow error during generating result as fasta file and the UTRDB just failed. Seems the FTP server is down or something. So, is there any suggestion how to download this file? Thank you for your help.
Could you perhaps use the UCSC Table Browser to get a BED file of 3' UTR Exons and then use
bedtools getfasta
to transform the bed file into fasta file?Thank you, but I don't know which option is for 3' UTR exons. Can you give me an instruction which one I should select?
Choose 'Track' = Ensembl Genes ... then 'Output Format' drop down menu = BED, then choose the option 3' UTR Exons
Thank you very much! I can download it. I just found naother problem. The reference I used is Ensemble. So, it seems the chromosome name is kind of not match. I also need to get the gene name with corresponding UTR. Currenlty, the bed file only consist of transcript ID. Do you have any suggestion?
Ensembl Biomart works just fine. If you cannot leave Ensembl annotation, then better get the fasta sequences from BioMart. When you do that, BioMart allows you to choose what fields you want in the header. See snapshot. So you can have gene names as well in the header -
I note that BioMart took a long time to generate the result file, but it is working.
Besides, if you want to go with the UCSC Tbl Bw BED, then just replace 'chr' out and 'chrM' with 'MT'. Also, in GRCh37 release, Ensembl & UCSC differed in the mitochondria sequence. I am not sure what is the status in latest release.
hi, my bad. Apologies. The fasta file didn't download and instead 'Ensembl is down' message appeared. I would suggest to use Carlos Guzman's solution. And you could replace the 'chr' prefix. Just confirm whether the mitochondrial sequence is the same in the UCSC & Ensembl (in GRCh37 it was not). About gene names => The only solution that comes to mind is use another round of parsing. From UCSC Table Brwsr, using "All Gencode v22", "All Gencode v23 or "Gencode v20" tracks, you can get the gene names associated with the ENST IDs. This is in GRCh38 release. In case of the older release, there is a separate track for Ensembl genes (under Group 'Genes & Gene predictions')
Yeah, it seems the Ensembl web is down for only that function. I tried another data to download, there is no problem. I already start to try parsing the UCSC file. I hope I don't make any mistake while parsing it. Even better, I hope Ensembl will return to normal.