Question: how to retrieve the nucleotide/base in a certain position using any programming language like R
0
gravatar for xrao
3.2 years ago by
xrao10
United States
xrao10 wrote:

Hello,

I have a long list of genome positions like below, and I would like to extract the corresponding nucleotide/base (A, T, G, or C) in those positions using R or bioconductor or other tools. I know that using UCSC genome browser can do that, but I cannot do that individually for each position. If you have a good way to achieve the goal, please let me know. Many thanks!

    chr start

    1  114242392

    1  114242

    2  7485484

    ............. thousands of lines

 

Thanks,

genome gene • 3.1k views
ADD COMMENTlink modified 3.2 years ago by geek_y9.3k • written 3.2 years ago by xrao10
1
gravatar for h.mon
3.2 years ago by
h.mon24k
Brazil
h.mon24k wrote:

See GenomicRanges. Its documentation is really helpful.
 

ADD COMMENTlink written 3.2 years ago by h.mon24k
1
gravatar for komal.rathi
3.2 years ago by
komal.rathi3.4k
Children's Hospital of Philadelphia, Philadelphia, PA
komal.rathi3.4k wrote:

I think there is a shorter way of doing this but I use the below:

library(Rsamtools)
library(BSgenome)

# read data
dat <- read.table(text = 'chr start
                  chr1  114242
                  chr1  114242
                  chr2  7485484', header=T,stringsAsFactors=F)

# if you have a genome fasta file you can import it using FaFile
# the format of chromosomes should match that in the fasta file
# i.e. 1, 2, 3 in both fasta and dat or chr1, chr2, chr3 in both
fasta_file <- FaFile(file='hg19.fasta')
gr1 <- GRanges(dat$chr,IRanges(start=as.numeric(dat$start), end=as.numeric(dat$start)))
refbase <- getSeq(fasta_file, gr1)
refbase <- as.data.frame(refbase)$x
dat$REF <- refbase
ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by komal.rathi3.4k
1
gravatar for geek_y
3.2 years ago by
geek_y9.3k
Barcelona/CRG/London/Imperial
geek_y9.3k wrote:

Something like:

samtools faidx hg19.fasta "1:20000-20000"
samtools faidx hg19.fasta `cat test.txt | grep -v "^chr" | awk '{ print $1":"$2"-"$2 }'`

I don't know if it works with stdin. You can explore more.

ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by geek_y9.3k
0
gravatar for xrao
3.2 years ago by
xrao10
United States
xrao10 wrote:

very helpful. Thank you both so much!

ADD COMMENTlink written 3.2 years ago by xrao10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1813 users visited in the last hour