The most comprehensive way of getting a full list of RefSeq IDs and sequences would be to download the large release files from RefSeq via their FTP site. Beware - the file structure is complex, and you will need to do some background reading to figure it out. You will also need to decide things like:
- do you want all sequences, or just one species, e.g. human?
- do you care which version of RefSeq you are looking at?
A simpler (but less comprehensive) method is to download a file from the UCSC Table Browser. e.g. select "RefSeq Genes" in the track field, "refGene" in the table field, and click get output. This will generate a file listing about 40,000 human NM and NR sequences. (You can get other species by selecting different options). This list will be a subset of the full RefSeq release, but should be good enough for most purposes.
Once you have the list of sequences, you can look up the corresponding sequences via the RefSeq interface.
Yes, I want to retrieve random sequences. Saying random, I mean sequences chosen from a group of RefSeq genes in a way as unbiased as possible. I am interested in generating lists of rundomly grouped genes to use as a control.
The Table Browser method helps, thank you. Indeed, I am interested only in species-specific sequences, and I find easy to repeat the download periodically.