Question: extract gene subsequences that have supporting RNASeq evidence
0
gravatar for mamillerpa
12 months ago by
mamillerpa40
United States
mamillerpa40 wrote:

I'd like to report, as nucleotide sequences, the portions of some genes that are covered by RNAseq data. I've written the following, but it seems slow and awkward. I have a feeling I shouldn't have to convert form one kind of range to a dataframe and then to another kind of range, or convert the sequence view to a dataframe in order to access the sequences themselves.

I'm not asking for a code review, but I would be grateful if someone could suggest how to work with the various range objects more elegantly.

These (genomic) BAMs were constructed with bowtie2 and rsem using Ensembl 74, so I've stuck with those gene models. I also have wig and bigwig files.

thanks, Mark

library(GenomicAlignments)
library(GenomicFeatures)
library(BSgenome.Mmusculus.UCSC.mm10)

# first check gene's distribution of coverage first and set to mean - 2 * SD?
coverage.thresh <- 3

# make this a list and wrap much of the script in lapply
current.gene <- "ENSG00000081237"

setwd("/bam/path/")

#have five files with three experimental conditions... pool or union coverage?  take max?
bamfile <- "pe_rnaseq.bam"

txdb <- makeTxDbFromBiomart(biomart = "ENSEMBL_MART_ENSEMBL",
                                host = "dec2013.archive.ensembl.org",
                                dataset = "mmusculus_gene_ensembl")

rnaseq.dat <- readGAlignmentPairs(bamfile)

cvg1 <- coverage(rnaseq.dat)

#  slow steps above OK, wouldn't include in loop

current.chrom <- select(txdb,
                        keys = current.gene,
                        columns = "TXCHROM",
                        keytype = "GENEID")

current.chrom <- current.chrom$TXCHROM

peaks <- slice(cvg1[[current.chrom]], lower = coverage.thresh)

# too reliant on dataframes form here on down?
genedf <- as.data.frame(genes(txdb))

ir <- IRanges(start = genedf$start[genedf$gene_id == current.gene],
              end = genedf$end[genedf$gene_id == current.gene])

intersect.frame <- as.data.frame(intersect(peaks, ir))

intersect.frame$seqnames <- paste0("chr", current.chrom)

coverage.gr <-
  makeGRangesFromDataFrame(df = intersect.frame, seqnames.field = "seqnames")

seq.cov.view <- Views(Mmusculus, coverage.gr)

seq.cov.frame <- as.data.frame(seq.cov.view)

# still need to ensure correct orientation (rev/comp?)
# orientation available in txdb but lost upon converion to irange?
print(seq.cov.frame$dna)
rna-seq • 264 views
ADD COMMENTlink modified 12 months ago • written 12 months ago by mamillerpa40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1132 users visited in the last hour