Question

Using Genomicranges

1

Entering edit mode

10.5 years ago

robjohn7000 ▴ 110

Hi,

I was trying to obtain the gene names from the column "Parent" in the GRangesList (from gff3) with 1 range and 23 as follows:

countDF <- data.frame(row.names=names(gffsub))

GRangesList with 1 range and 23 metadata shortened as follows:

> gffsub
GRangesList of length 4449:
[[1]] 
GRanges with 1 range and 23 metadata columns:
     seqnames     ranges strand |   ....................Parent

But countDF returned null. How can I modify the code to give me a row of the gene names?

Thanks

r rna-seq bioconductor • 4.5k views

ADD COMMENT • link updated 10.1 years ago by Biostar 20 • written 10.5 years ago by robjohn7000 ▴ 110

1

Entering edit mode

Have you tried just doing parents <- elementMetadata(gffsub)$Parent). Anyway, you're not creating an actual dataframe, which would seem to be the root of your problem.

ADD REPLY • link 10.5 years ago by Devon Ryan 104k

0

Entering edit mode

as a sidenote, mcols() will do the same thing as elementMetadata()

ADD REPLY • link 10.5 years ago by Ying W ★ 4.2k

0

Entering edit mode

Thanks guys!

I have tried mcols and Parents <- elementMetadata(gffsub)$Parent;

Going further using the following code threw errors (obviously because countDF = NULL):

samples <- as.character(targets$Fastq)
samplespath <- paste("./results/", samples, ".bam", sep="")
names(samplespath) <- samples
countDF <- data.frame(row.names=(names(gffsub)))
#countDF <- data.frame(row.names=Parents)
for(i in samplespath) {
  aligns <- readBamGappedAlignments(i) # Substitute next two lines with this one.
  #counts <- countOverlaps(gffsub, aligns, ignore.strand=TRUE)
  counts <- summarizeOverlaps(gffsub, aligns, ignore.strand=TRUE)
  countDF <- cbind(countDF, counts)
}
colnames(countDF) <- samples

ADD REPLY • link updated 10.5 years ago by Devon Ryan 104k • written 10.5 years ago by robjohn7000 ▴ 110

0

Entering edit mode

cross posted on SE: http://seqanswers.com/forums/showthread.php?t=35329

ADD REPLY • link 10.5 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Post is no longer on SE: http://seqanswers.com/forums/showthread.php?t=35329

ADD REPLY • link 10.5 years ago by robjohn7000 ▴ 110

score 0 · Answer 1 · 2013-11-14

0

Entering edit mode

10.5 years ago

Ying W ★ 4.2k

Give this a try

countDF <- data.frame(matrix(0, nrow=length(gffsub), ncol=length(samples))) #preallocate data frame
colnames(countDF) = samplespath
for(i in samplespath) {
  aligns <- readBamGappedAlignments(i) 
  counts <- summarizeOverlaps(gffsub, aligns, ignore.strand=TRUE)
  countDF[, i] <- count
}
colnames(countDF) <- samples
rownames(countDF) <- mcols(gffsub)$Parent #REPLACE w/something here

ADD COMMENT • link 10.5 years ago by Ying W ★ 4.2k

0

Entering edit mode

Thanks for the code modification Ying. After running it, I got the following warning messages (five times - presumably one for each of my 5 samples):

Warning messages: 1: In .Seqinfo.mergexy(x, y) : Each of the 2 combined objects has sequence levels not in the other: - in 'x': Chromosome - in 'y': NC_016810.1 Make sure to always combine/compare objects based on the same reference genome (use suppressWarnings() to suppress this warning).

ADD REPLY • link 10.5 years ago by robjohn7000 ▴ 110

0

Entering edit mode

That's just a warning so you can ignore it if you want. what it says is that your aligns might not all be mapped to the same reference because one of your files has reads in chromosome NC_016810.1 while another does not.

ADD REPLY • link 10.5 years ago by Ying W ★ 4.2k

0

Entering edit mode

I'm investigating the gff3file and will post ASAP. Thanks!

ADD REPLY • link 10.4 years ago by robjohn7000 ▴ 110

0

Entering edit mode

I tried to find where the unequal length problem started now added 3 plasmids my previous gff3 file (attached: http://wikisend.com/download/942278/gff_files.tar.gz) and now a new error:

gff compossition:

NC_016810 - Chromosome
NC_017718 - Plasmid1
NC_017719 - Plasmid2
NC_017720 - Plasmid3

code:

library(rtracklayer);
library(GenomicRanges);
library(Rsamtools)
library(Biostrings)

gff <- import.gff3("./data/NC_016810.gff3", asRangedData=FALSE)
seqlengths(gff) <- c(4878012,86908,86209,93842)
#seqlengths(gff) <- end(ranges(gff[which(elementMetadata(gff)[,"type"]=="region 1"),]))
#seqlengths(gff) <- end(ranges(gff[which(elementMetadata(gff)[,"type"]=="region"),]))
#seqlengths(gff) <- end(ranges(gff[which(elementMetadata(gff)[,"type"]== ("4878012")),]))
subgene_index <- which(elementMetadata(gff)[,"type"] == "exon")
gffsub <- gff[subgene_index,] # Returns only gene ranges
#gffsub[1:4, c(2,5)]

#Parents <- elementMetadata(gffsub)$Parent
#ids <- gsub("Parent=|\\..*", "", elementMetadata(gffsub)$group)
#ids <- gsub("Parent=|\\..*", "", elementMetadata(gffsub)$group)
ids <- gsub("Parent=\\S*", "", elementMetadata(gffsub)$Parent)
gffsub <- split(gffsub, ids) # Coerce to GRangesList

Seqlength(gff) returns:

NC_016810 NC_017718 NC_017719 NC_017720
 4878012     86908     86209     93842

gffsub returns error:

Error in relist(x, f) : 
shape of 'skeleton' is not compatible with 'NROW(flesh)'

My guess is that I'm doing something wrong in the code above.

ADD REPLY • link 10.4 years ago by robjohn7000 ▴ 110

0

Entering edit mode

uhh, I actually have no clue what you are trying to do here, are you sure there is something in ids? the warning earlier had something to do with either seqlevels in your bams but not gff or the other way around. what you needed to do was check to make sure that the bams were all aligned to the same reference and the chromosomes in that reference were the same format as the chromosomes in the gff file.

From your code it almost seems like you are just trying to do a bunch of things without understanding why things work or don't.. You should put more thinking into what you are trying to do and what the machine is doing and where things break down so you get the error. From the first post your errors/problems all seem like not understanding what your R code is doing

ADD REPLY • link 10.4 years ago by Ying W ★ 4.2k