Get sequence by genomic coordinates in R
1
0
Entering edit mode
2.0 years ago
a11msp ▴ 110

I've realised that after many years of using R, I still don't know a good way to extract a sequence by genomic coordinates. I tried using bioMart, but it seems like getSequence() can't just get any sequence, it asks for some anchors such as gene name, etc. Would appreciate your advice!

R sequence • 2.3k views
7
Entering edit mode
2.0 years ago
ATpoint 65k

In R given a BSgenome object, here chr1:3000000-3000100, using the Biostrings library:

my.dnastring <- as.character(Biostrings::getSeq(BSgenome.Mmusculus.UCSC.mm10, "chr1", 3000000, 3000100))

my.dnastring
> NTTCTGTTTCTATTTTGTGGTTACTTTGAGGAGAGTTGGAATTAGGTCTTCTTTGAAGGTCTGGTAGAACTCTGCATTAAACCCATCTGGTCCTGGGCTTT

0
Entering edit mode

This is great - thanks very much!

0
Entering edit mode

Can I somehow use this with a file that contains genomic coordinates. I have a data.frame and GRange object that contains all the coordinates for my sequences that I would like to retrieve.