I am using R to return the 3' UTR start and end sites for about 48,000 mouse genes. Upon omitting the NAs that are returned and averaging the duplicates, I am only left with about 20,000 results. I looked up some of the genes that are missing using the biomaRt web application and the result is returning as sequence unavailable. I am not sure where to go from here to get the missing data. I am primarily interested in the length of the 3'UTRs, not the sequence. Any help would be greatly appreciated.
It does look like that is the problem. Thanks!