Using Genomicranges
1
1
Entering edit mode
10.8 years ago
robjohn7000 ▴ 110

Hi,

I was trying to obtain the gene names from the column "Parent" in the GRangesList (from gff3) with 1 range and 23 as follows:

countDF <- data.frame(row.names=names(gffsub))

GRangesList with 1 range and 23 metadata shortened as follows:

> gffsub
GRangesList of length 4449:
[[1]] 
GRanges with 1 range and 23 metadata columns:
     seqnames     ranges strand |   ....................Parent

But countDF returned null. How can I modify the code to give me a row of the gene names?

Thanks

r rna-seq bioconductor • 4.7k views
ADD COMMENT
1
Entering edit mode

Have you tried just doing parents <- elementMetadata(gffsub)$Parent). Anyway, you're not creating an actual dataframe, which would seem to be the root of your problem.

ADD REPLY
0
Entering edit mode

as a sidenote, mcols() will do the same thing as elementMetadata()

ADD REPLY
0
Entering edit mode

Thanks guys!

I have tried mcols and Parents <- elementMetadata(gffsub)$Parent;

Going further using the following code threw errors (obviously because countDF = NULL):

samples <- as.character(targets$Fastq)
samplespath <- paste("./results/", samples, ".bam", sep="")
names(samplespath) <- samples
countDF <- data.frame(row.names=(names(gffsub)))
#countDF <- data.frame(row.names=Parents)
for(i in samplespath) {
  aligns <- readBamGappedAlignments(i) # Substitute next two lines with this one.
  #counts <- countOverlaps(gffsub, aligns, ignore.strand=TRUE)
  counts <- summarizeOverlaps(gffsub, aligns, ignore.strand=TRUE)
  countDF <- cbind(countDF, counts)
}
colnames(countDF) <- samples
ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode
10.8 years ago
Ying W ★ 4.3k

Give this a try

countDF <- data.frame(matrix(0, nrow=length(gffsub), ncol=length(samples))) #preallocate data frame
colnames(countDF) = samplespath
for(i in samplespath) {
  aligns <- readBamGappedAlignments(i) 
  counts <- summarizeOverlaps(gffsub, aligns, ignore.strand=TRUE)
  countDF[, i] <- count
}
colnames(countDF) <- samples
rownames(countDF) <- mcols(gffsub)$Parent #REPLACE w/something here
ADD COMMENT
0
Entering edit mode

Thanks for the code modification Ying. After running it, I got the following warning messages (five times - presumably one for each of my 5 samples):

Warning messages: 1: In .Seqinfo.mergexy(x, y) : Each of the 2 combined objects has sequence levels not in the other: - in 'x': Chromosome - in 'y': NC_016810.1 Make sure to always combine/compare objects based on the same reference genome (use suppressWarnings() to suppress this warning).

ADD REPLY
0
Entering edit mode

That's just a warning so you can ignore it if you want. what it says is that your aligns might not all be mapped to the same reference because one of your files has reads in chromosome NC_016810.1 while another does not.

ADD REPLY
0
Entering edit mode

I'm investigating the gff3file and will post ASAP. Thanks!

ADD REPLY
0
Entering edit mode

I tried to find where the unequal length problem started now added 3 plasmids my previous gff3 file (attached: http://wikisend.com/download/942278/gff_files.tar.gz) and now a new error:

gff compossition:

NC_016810 - Chromosome
NC_017718 - Plasmid1
NC_017719 - Plasmid2
NC_017720 - Plasmid3

code:

library(rtracklayer);
library(GenomicRanges);
library(Rsamtools)
library(Biostrings)

gff <- import.gff3("./data/NC_016810.gff3", asRangedData=FALSE)
seqlengths(gff) <- c(4878012,86908,86209,93842)
#seqlengths(gff) <- end(ranges(gff[which(elementMetadata(gff)[,"type"]=="region 1"),]))
#seqlengths(gff) <- end(ranges(gff[which(elementMetadata(gff)[,"type"]=="region"),]))
#seqlengths(gff) <- end(ranges(gff[which(elementMetadata(gff)[,"type"]== ("4878012")),]))
subgene_index <- which(elementMetadata(gff)[,"type"] == "exon")
gffsub <- gff[subgene_index,] # Returns only gene ranges
#gffsub[1:4, c(2,5)]

#Parents <- elementMetadata(gffsub)$Parent
#ids <- gsub("Parent=|\\..*", "", elementMetadata(gffsub)$group)
#ids <- gsub("Parent=|\\..*", "", elementMetadata(gffsub)$group)
ids <- gsub("Parent=\\S*", "", elementMetadata(gffsub)$Parent)
gffsub <- split(gffsub, ids) # Coerce to GRangesList

Seqlength(gff) returns:

NC_016810 NC_017718 NC_017719 NC_017720
 4878012     86908     86209     93842

gffsub returns error:

Error in relist(x, f) : 
shape of 'skeleton' is not compatible with 'NROW(flesh)'

My guess is that I'm doing something wrong in the code above.

ADD REPLY
0
Entering edit mode

uhh, I actually have no clue what you are trying to do here, are you sure there is something in ids? the warning earlier had something to do with either seqlevels in your bams but not gff or the other way around. what you needed to do was check to make sure that the bams were all aligned to the same reference and the chromosomes in that reference were the same format as the chromosomes in the gff file.

From your code it almost seems like you are just trying to do a bunch of things without understanding why things work or don't.. You should put more thinking into what you are trying to do and what the machine is doing and where things break down so you get the error. From the first post your errors/problems all seem like not understanding what your R code is doing

ADD REPLY

Login before adding your answer.

Traffic: 779 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6