Question: how to retrieve the nucleotide/base in a certain position using any programming language like R
0
gravatar for xrao
3.7 years ago by
xrao10
United States
xrao10 wrote:

Hello,

I have a long list of genome positions like below, and I would like to extract the corresponding nucleotide/base (A, T, G, or C) in those positions using R or bioconductor or other tools. I know that using UCSC genome browser can do that, but I cannot do that individually for each position. If you have a good way to achieve the goal, please let me know. Many thanks!

    chr start

    1  114242392

    1  114242

    2  7485484

    ............. thousands of lines

 

Thanks,

genome gene • 3.6k views
ADD COMMENTlink modified 3.7 years ago by geek_y9.8k • written 3.7 years ago by xrao10

very helpful. Thank you both so much!

ADD REPLYlink written 3.7 years ago by xrao10
2
gravatar for komal.rathi
3.7 years ago by
komal.rathi3.4k
Children's Hospital of Philadelphia, Philadelphia, PA
komal.rathi3.4k wrote:

I think there is a shorter way of doing this but I use the below:

library(Rsamtools)
library(BSgenome)

# read data
dat <- read.table(text = 'chr start
                  chr1  114242
                  chr1  114242
                  chr2  7485484', header=T,stringsAsFactors=F)

# if you have a genome fasta file you can import it using FaFile
# the format of chromosomes should match that in the fasta file
# i.e. 1, 2, 3 in both fasta and dat or chr1, chr2, chr3 in both
fasta_file <- FaFile(file='hg19.fasta')
gr1 <- GRanges(dat$chr,IRanges(start=as.numeric(dat$start), end=as.numeric(dat$start)))
refbase <- getSeq(fasta_file, gr1)
refbase <- as.data.frame(refbase)$x
dat$REF <- refbase
ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by komal.rathi3.4k
2
gravatar for geek_y
3.7 years ago by
geek_y9.8k
Barcelona
geek_y9.8k wrote:

Something like:

samtools faidx hg19.fasta "1:20000-20000"
samtools faidx hg19.fasta `cat test.txt | grep -v "^chr" | awk '{ print $1":"$2"-"$2 }'`

I don't know if it works with stdin. You can explore more.

ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by geek_y9.8k
1
gravatar for h.mon
3.7 years ago by
h.mon27k
Brazil
h.mon27k wrote:

See GenomicRanges. Its documentation is really helpful.
 

ADD COMMENTlink written 3.7 years ago by h.mon27k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1065 users visited in the last hour