Question: (edgeR-related) DGElist error message 'Negative counts not allowed'.
0
gravatar for fmerkal
9 days ago by
fmerkal10
USA/Lowell/University of Massachusetts Lowell
fmerkal10 wrote:

Hello,

After counting genomic features with featureCounts

featureCounts -F GTF -t exon -g gene_id -f -O --minOverlap 2 -M --fraction -J -a annotation.gtf -o out.txt 1_sorted.bam 2_sorted.bam 3_sorted.bam 4_sorted.bam

I decided to use the output file as an input for edgeR to create the DGElist. First I used the read.delim function in R to read the text file and create a table. The command I used is shown below.

counts <- read.delim("~/Desktop/thesis_bioinformatics/chick_fl_hl_comparison/counts_fcounts.txt", sep="\t", header = TRUE, check.names = FALSE, stringsAsFactors = FALSE)

After getting the table, I tried using it to create the DGElist by using the commands below.

group<-c(1,1,2,2)
y <- DGEList(counts=counts, group=group)

This outputs an error message that says Error: Negative counts not allowed. I followed someones advice from this post Question: Error when running edgeR, where they suggest using is.na(counts) || counts < 0 to check if NA or negative counts are present. I checked all the columns of my counts file and the only column where the output was true was the Strand column, which describes the directionality. I removed this column from the table and tried running DGElist again with a counts file that did not have the Strands column. The same error message appeared. I'm pretty confused as to why I get this error message when none of my counts are negative. Any help would be appreciated.

Cheers, Fjodor

The start of my counts table looks something like this:

  Geneid              Chr  Start   End   Strand   Length   1   2   3   4

1 ENSGALG00000054818   1    9774  10061     -       288    #   #   #   #
rna-seq R software error • 133 views
ADD COMMENTlink modified 8 days ago by Gordon Smyth670 • written 9 days ago by fmerkal10

Did you manipulate out.txt prior to reading it with the read.delim command? With this command you should not be able to read a featureCounts output. It should complain about the header line (starting with # Program....) not matching the number of columns. Please show head -n 10 counts_fcounts.txt

ADD REPLYlink modified 9 days ago • written 9 days ago by ATpoint14k

Yes, I did remove the header and gave the counts columns easier names.

Geneid  Chr Start   End Strand  Length  FL1 FL2 HL1 HL2

ENSGALG00000054818  1   9774    10061   -   288 0   0   0   0

ENSGALG00000054818  1   9314    9364    -   51  0   0   0   0

ENSGALG00000054818  1   8674    8904    -   231 2.10    1.67    1.33    2.20

ENSGALG00000054818  1   6755    6829    -   75  2.90    1.52    2.46    2.75
ADD REPLYlink modified 9 days ago • written 9 days ago by fmerkal10
2
gravatar for h.mon
9 days ago by
h.mon24k
Brazil
h.mon24k wrote:

After reading the output of featureCounts, you have to subset the data frame to include just the counts:

counts <- read.delim("~/Desktop/thesis_bioinformatics/chick_fl_hl_comparison/counts_fcounts.txt", sep="\t", header = TRUE, check.names = FALSE, stringsAsFactors = FALSE)
rownames( counts ) <- counts$Geneid
counts <- counts[ , -c(1:6) ]
ADD COMMENTlink written 9 days ago by h.mon24k

Hello, Thank you for your reply! This is a great idea and I tried it. The only issue is that there appear to be non-unique values in the Geneid column and this doesn't allow me to set the Geneid column values as row names. Could I still keep the R assigned row names and only delete columns 2:6?

ADD REPLYlink written 9 days ago by fmerkal10

Take everything I say bellow with a grain of salt, as I always performed summarization over genes, never over exons, and I don't know the recommended settings for exon-level summarization.

I suppose there are non-unique Geneids because you are summarizing over exons (flag -f) instead of over genes, and the exons don't have unique identifiers - I am really not sure, because I never used featureCounts with all these options (`-M -O --fraction -f -J), and it has been a long time since I last used featureCounts. Could you post a snippet of you GTF annotation?

Also, you may want to remove multi-mapped reads from your bam files, as those will be counted (flag -M) and featureCounts (with --fraction) use the naive method of just splitting the counts evenly between all features overlapped by the multi-mappers.

ADD REPLYlink written 8 days ago by h.mon24k

Hi, I was thinking about summarizing over genes instead of exons, after reading your comment, but I am interested in looking at alternative splicing as well as differential gene expression so I think that wouldn't be a good idea for my situation.

I looked more into this problem and found out that there is an option in DGElist called genes where you can specify the Geneids and other information related chromosome number and library size. Maybe this is a well-known thing but I am very new to RNA-seq analysis so I didn't know anything about it. Regardless, I tried the following with the counts table described earlier in mind and it seemed to work just fine.

library(edgeR)
counts <- read.delim("~/Desktop/thesis_bioinformatics/chick_fl_hl_comparison/counts_fcounts.txt", sep="\t", header = TRUE, check.names = FALSE, stringsAsFactors = FALSE)
View(counts)
genes <- counts[1:6]
group<-c(1,1,2,2)
y<-DGEList(counts = cbind(counts$FL1, counts$FL2, counts$HL1, counts$HL2), genes=genes, group=group)

Thank you for your comments, they definitely lead me to the right direction :)

ADD REPLYlink written 8 days ago by fmerkal10
2
gravatar for Gordon Smyth
8 days ago by
Gordon Smyth670
Australia
Gordon Smyth670 wrote:

This is how I would do it:

library(edgeR)
fc <- read.delim("counts_fcounts.txt", stringsAsFactors = FALSE)
genes <- fc[,1:6]
counts <- data.matrix(fc[,7:11])
row.names(counts) <- paste(genes$Geneid, genes$Start, genes$End, sep=".")
group <- factor( c("FL","FL","HL","HL") )
y <- DGEList(counts, genes=genes, group=group)
ADD COMMENTlink modified 8 days ago • written 8 days ago by Gordon Smyth670
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1945 users visited in the last hour