As an alternative web-based solution the following will give you 3' UTR information for each transcript of a gene, but will require a little subtraction, try the Ensembl Biomart, e.g.
Choose Database -> Ensembl Genes 60
Choose Dataset -> Homo Sapiens Genes
Click "Filters" on left hand menubar
Expand "Gene" section by clicking "+"
Select "ID list limit" check box.
Select Entrez gene IDs from ID list limit drop down menu
Paste in list of Entrez gene IDs
Click "Attributes" on left hand menubar
Click "Sequences" radio button
Expand "Sequences" section by clicking "+"
Check "3' UTR" under "sequences" header
Expand "Header information" section by clicking "+"
Check "3' UTR start" and "3' UTR end" and "Transcript name" under "Transcript information" header
Click "Results" button at top left.
This will give you a set of fasta files of 3' UTRs for all transcripts for your set of Entrez gene IDs, which contain the start and stop of each 3' UTR on genome coordinates. I believe this solution has the same problem of not accounting for introns in 3' UTRs, but because of the gene<->transcript<->UTR mapping, it will account for alternative 3' UTRs.
The table KnownGene in the UCSC database contains all the information you want about the structure of the gene (the positions of the introns, exons, cdsStart/end , txStart/end).
The table kgXref contains the NCBI id and is linked to KnownGene.
for the genes on the '+' strand the query would be (for rapidity, I won't take in account any splicing between the last codon and the end of the transcription, it would need more code than a simple SQL query ):
mysql -h genome-mysql.cse.ucsc.edu -A -u genome -D hg18
mysql>select distinct X.geneSymbol, K.txEnd-K.cdsEnd
kgXref as X,
knownGene as K
and K.name=X.kgId and
| geneSymbol | K.txEnd-K.cdsEnd |
| BC032353 | 3006 |
| AX748260 | 3157 |
| BC048429 | 1540 |
| OR4F5 | 0 |
| OR4F5 | 1 |
| DQ599874 | 31 |
| DQ599768 | 78 |
Hmmm, there is not typically alternative splicing of 3'-UTRs, but it can happen. There certainly are lots of examples of alternate terminal exons. So, I would not want to link gene symbol to 3'-UTR length, but rather gene symbol to mRNA identifier to its 3'-UTR length. Perhaps Pierre's table above shows that for gene OR4F5, but a length of zero is not a good test for one gene with 2 mRNA isoforms and hence two different, or not, 3'-UTR lengths.