I am using R to return the 3' UTR start and end sites for about 48,000 mouse genes. Upon omitting the NAs that are returned and averaging the duplicates, I am only left with about 20,000 results. I looked up some of the genes that are missing using the biomaRt web application and the result is returning as sequence unavailable. I am not sure where to go from here to get the missing data. I am primarily interested in the length of the 3'UTRs, not the sequence. Any help would be greatly appreciated.
Have you checked whether the transcripts of the omitted genes have annotated UTRs using Ensembl: www.ensembl.org
From the large number of genes in your list, I suspect that those missing UTR sequence will be non-coding, and so will not have UTR sequences.