Question: DESeq2 and integers values
0
gravatar for Morris_Chair
7 months ago by
Morris_Chair140
Morris_Chair140 wrote:

Dear all,

I have an experiment with two different conditions, two treated and two untreated samples from where I performed RNA sequencing. here is the code that I'm using for the expression analysis

countdata <- ( as.matrix(read.table("/RNAseq/counts/DESeq2_STAR/count.table", header=TRUE,row.names=1))) 
condition <- factor(c(rep("treated",2), rep("Untreated",2)))
coldata <-data.frame(row.names=colnames(countdata),condition)
library("DESeq2")

everything is ok until I use DESeq2

    dds <- DESeqDataSetFromMatrix(countdata, coldata, design=~condition)

I can't find a way to make the integer number, and I don't know if I have to fix something with featureCounts or I can do this in R.

please help,

thank you

here the result

head(countdata)
mapping.star.Treated_1.bam mapping.star.Treated_2.bam mapping.star.Untreated_1.bam mapping.star.Untreated_2.bam
ENSG00000223972                       7.12                      58.49                       119.48                        27.45
ENSG00000227232                    2793.80                    2960.74                      2410.36                      1630.14
ENSG00000243485                      30.00                      16.32                         2.93                        20.52
ENSG00000237613                       0.00                       1.00                         0.00                         1.83
ENSG00000268020                       1.00                       0.00                         0.00                         0.50
ENSG00000240361                       0.00                       0.00                         0.00                         0.00
  
condition
[1] treated   treated   Untreated Untreated
Levels: treated Untreated
  
head(coldata)
condition
 mapping.star.Treated_1.bam     treated
 mapping.star.Treated_2.bam     treated
 mapping.star.Untreated_1.bam Untreated
 mapping.star.Untreated_2.bam Untreated
  
library("DESeq2")
dds <- DESeqDataSetFromMatrix(countdata, coldata, design=~condition)
**Error in DESeqDataSet(se, design = design, ignoreRank) : 
  some values in assay are not integers**
  
rna-seq deseq2 • 773 views
ADD COMMENTlink modified 7 months ago • written 7 months ago by Morris_Chair140
1

1) This is not a forum, please remove this tag.

2) Normally, you should have integers going into DESeq2. This is likely a featurecounts "issue". Please share how you generated your counts from featurecounts. Is this gene-level data?

ADD REPLYlink written 7 months ago by lshepard370

Hi Ishepard, I don't know what tag you mean, I'm sorry if I did something wrong.

Here it is what I used to generate the count file with featureCounts and yes, this is a gene-level data

featureCounts -M --fraction -g gene_id -T 4 -s 0 -a annotation/Homo_sapiens.GRCh37.75.gtf -o counts/T-ALL.star.featureCounts files.bam

thanks

ADD REPLYlink written 7 months ago by Morris_Chair140
3

Here is your problem: --fraction will generate fractional counts. featureCounts is not appropriate for counting multi-mapped reads, you shouldn't use -M --fraction. If you want to count multi-mapped reads, use Salmon, RSEM or something similar.

ADD REPLYlink written 7 months ago by h.mon27k

Ideally if I remove this command -M --fraction I should have integers value, I will loose some information but I can still go further with the analysis, isn't it?

thank you

ADD REPLYlink modified 7 months ago • written 7 months ago by Morris_Chair140
1

Yes indeed.

But you won't loose any information, because featureCounts use the naive method of splitting the counts by the number of positions a read mapped. Salmon, kallisto and RSEM, on the other hand, use a expectation-maximization maximum likelihood algorithm, which will apportion multi-mapped reads based on the proportion of uniquely mapped reads mapping to the same feature.

Which is just a long way of stating the algorithm featureCounts use for counting multi-mappers is bad and should be avoided.

ADD REPLYlink modified 7 months ago • written 7 months ago by h.mon27k

Or, use --fraction, but round your counts before giving them to DESeq.

ADD REPLYlink written 7 months ago by swbarnes26.7k

Thank you to all for the precious advices, I got my first heatmap :)

Now I have to find a way to associate the gene name instead of the gene ID

ADD REPLYlink modified 7 months ago by ATpoint24k • written 7 months ago by Morris_Chair140
1

biomaRt package is a good annotation package as an fyi. You can add gene ID, descriptions, coordinates etc...

ADD REPLYlink written 7 months ago by lshepard370
2
gravatar for h.mon
7 months ago by
h.mon27k
Brazil
h.mon27k wrote:

The error message is very clear: the counts are not integers:

ENSG00000223972          7.12          58.49           119.48            27.45
  

DESeq2 expect integers, which is what you get from e.g. featureCounts. If you counted over transcripts then summarized over genes, you have to round up the counts, but the recommended approach is to use the tximport R/Bioconductor package to perform the transcripts to genes summarization.

If these values are TPMs, FPKMs, or the like, do not use them with DESeq2. DESeq2 expects raw counts as input.

ADD COMMENTlink modified 7 months ago • written 7 months ago by h.mon27k

Hi h.mon, as far as I know bam files can't be TPM or FPKMs/RPKM because are not processed to be

thanks

ADD REPLYlink written 7 months ago by Morris_Chair140

Bam files can't be TPMs, FPKMs or counts. Bam files store sequence reads and their mapping positions to a reference genome (or transcriptome). Getting TPMs from bams is more or less the same as getting counts from bams: one has to iterate over the reads and record where they mapped in relation to a given annotation.

ADD REPLYlink written 7 months ago by h.mon27k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2068 users visited in the last hour