I have a list of about 3,000 genes and I want to retrieve the genomic locus of each gene's 3' UTR. My goal is to screen all the sequence that could be part of a gene's 3' UTR, even if some of that sequence isn't always in the 3' UTR of every transcript.
I've tried the methods in a few previous answers and, for about 80% of the genes in my list, I can find the predicted start and end of the 3' UTR using Biomart or the UCSC table browser. The problem is that for each gene I get multiple results (for all the alternative transcripts of that gene), and in each result the 3' UTR starts and ends at a different place. What I would like is the site of the most upstream 3' UTR start and the most downstream 3' UTR end that have been predicted for a given gene.
Does anyone know a straightforward way to get these from UCSC or Biomart? Can I perhaps get the shortest predicted CDSend and longest predicted transcription end?
Thanks for your help!