Question: Using featureCounts Output for DE analysis in edgeR
0
gravatar for markm014
12 weeks ago by
markm0140
markm0140 wrote:

I am getting the following error in R studio when I attempt to import featureCounts count matrices into edgeR for analysis. My question is, do I have to manually modify the output of featureCounts for use in edgeR? Everywhere I look within documentation, it makes it seem as if I can directly load the output of featureCounts into edgeR. If the answer is as easy as switching negative values (I think featureCounts outputs '-1' in the case of no overlaps) to zero, I can handle this but it seems to me as if this is a good way to mess with statistics.

The error is:

Error: Negative counts not allowed

The command I am running to generate this error is:

group = c(0,1,2,3,4,5,3,4,5,3,4,5)
dge = DGEList(counts = 'file_path', group = group)

where the file path listed is the output of featureCounts run on 12 bam files. I ran featureCounts with no issue and about 60% of my reads overlapped features using the following command:

featureCounts -a 'Mus_musculus.GRCm38.95.gtf' -o features_count_all/total_file.count Sample1-1 Sample2-1 Sample3-1 Sample4-1 Sample4-2 Sample4-3 Sample5-1 Sample5-2 Sample5-3 Sample6-1 Sample6-2 Sample6-3

My matrix file looks like this. Do I need to consolidate this to be only a simple raw count number for every matrix spot rather than the comma separated fields?

ENSMUSG00000088159  1   15019040    15019159    -   120 0   0   0   0   0   0   0   0   0   0   0   0
ENSMUSG00000073737  1   15268802    15269797    -   996 2   1   2   4   1   3   2   2   2   6   2   1
ENSMUSG00000092083  1;1;1;1;1   15287254;15312363;15312452;15709485;15709485    15287484;15313030;15313030;15712548;15723750    +;+;+;+;+   15165   3   6   3   0   0   0   2   0   0   1   0   0
ENSMUSG00000102937  1   15364302    15365834    +   1533    0   0   0   0   0   0   0   0   0   0   0   0
ENSMUSG00000104149  1   15556249    15558337    +   2089    0   0   0   0   0   0   0   0   0   0   0   0
ENSMUSG00000088829  1   15685935    15686046    -   112 0   0   0   0   1   0   0   0   0   0   0   0
ENSMUSG00000077377  1   15757832    15757963    +   132 0   0   0   0   0   0   0   0   0   0   0   0
ENSMUSG00000101652  1;1 15760122;15760560   15760263;15760668   -;- 251 0   1   1   2   1   1   2   3   1   3   4
edger rna-seq featurecounts • 272 views
ADD COMMENTlink modified 12 weeks ago by Friederike4.1k • written 12 weeks ago by markm0140
2
gravatar for Friederike
12 weeks ago by
Friederike4.1k
United States
Friederike4.1k wrote:

FeatureCounts adds a couple of additional gene information to the beginning of the matrix, i.e., it does not only contain the counts. You need to remove the columns containing the gene position, strand and length (e.g. 1 15019040 15019159 - 120) and the GeneIDs should be assigned to row names. If you read in the feature counts results first, this will become clear:

# read in the results (not tested, you may need to play around with read.table parameters)
fc_res <- read.table('file_path', header = T)

# assign row.names
row.names(fc_res) <- fc_res$GeneID

# exclude superfluous columns
fc_res <- fc_res[, -c(1:6)] 
ADD COMMENTlink modified 12 weeks ago • written 12 weeks ago by Friederike4.1k

Thank you a bunch, this worked like a charm.

For anyone trying to replicate this solution, note that you should not use quotes when dropping columns using '-c'.

ADD REPLYlink written 12 weeks ago by markm0140

lol, I hardly use data.frames these days so I tend to forget the syntax details, thanks for hanging in there -- I've updated the code snippet

ADD REPLYlink written 12 weeks ago by Friederike4.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1122 users visited in the last hour