How to input data for DESeq2 from individual HTSeq count?
3
4
Entering edit mode
3.0 years ago
sudu87 ▴ 40

I am comparing the gene expression of 2 bacteria under 1 condition. I have now the count tables for 3 tech. replicates for each bacteria.

Bacteria1_1.count 
Bacteria1_2.count 
Bacteria1_3.count

...same for the other bacteria.

These files look like this:

gene1 10000 
gene2 500 
gene3 0 
gene4 5000

I want to use DESeq2 for differential gene expression analysis. But I cannot figure out how to properly execute the DESeqDataSetFromHTSeqCount() command with this type of data.

Is there another intermediate step to add ?

RNA-Seq HTSeq rna-seq deseq deseq2 • 7.4k views
ADD COMMENT
8
Entering edit mode
3.0 years ago
ZZzzzzhong ▴ 240
directory <- "/path/to/your/files/"

directory is where your htseq-count output files are located.

sampleFiles <- grep("Bacteria",list.files(directory),value=TRUE)

samplesFiles is a variable which points to your htseq-count output files,

condition <- c('Bacteria1','Bacteria1','Bacteria1','Bacteria2','Bacteria2','Bacteria2')

One for one for your sample type

sampleTable <- data.frame(sampleName = sampleFiles,
                      fileName = sampleFiles,
                      condition = condition)
library("DESeq2")
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
                                   directory = directory,
                                   design= ~ condition)
ADD COMMENT
0
Entering edit mode

Thank you so much for this.

Sorry for these stupid questions but I have one more issue in:

sampleFiles <- grep("Bacteria",list.files(directory),value=TRUE)

I have 2 different bacteria names as the filenames for the .count files. For example, "cowan" and "isolate" are names of the bacteria. I tried grep-ing both at a time but it doesn't work. How can I can solve this?

Thanks a ton,

Sudip

ADD REPLY
2
Entering edit mode

Just like the variable condition

sampleFiles <- c('cowan1','cowan2','cowan3','isolate1','isolate2','isolate3')

remember sampleFiles correspond with condition

ADD REPLY
0
Entering edit mode

Hi ZZzzzzhong I am trying to follow the method you have suggested, however, I am getting an error " Error in data.frame(sampleName = sampleFiles, fileName = sampleFiles, : arguments imply differing number of rows: 0, 4".

I have checked the number of rows in all individual files , and they are same.

Here is my script

directory <- "C/RNA SEQ adv cgrp/IWAT" sampleFiles <- grep("COUNT FILES",list.files(directory),value=TRUE) condition <- c('237 COUNT FILES','264 COUNT FILES','267 COUNT FILES','265 COUNT FILES') sampleTable <- data.frame(sampleName = sampleFiles, + fileName = sampleFiles, + condition = condition)

ADD REPLY
0
Entering edit mode

ddsHTSeq <- DESeq(ddsHTSeq) estimating size factors estimating dispersions Error in checkForExperimentalReplicates(object, modelMatrix) :

The design matrix has the same number of samples and coefficients to fit,

so estimation of dispersion is not possible. Treating samples as replicates was deprecated in v1.20 and no longer supported since v1.

ADD REPLY
0
Entering edit mode

When I am executing :

ADD REPLY
3
Entering edit mode
3.0 years ago
poojasethiya ▴ 100

You can use following function to run DESeq2 on htseq-count output.

deseq_from_htseqcount.R

~ Pooja

ADD COMMENT
0
Entering edit mode
3 months ago
Nai • 0

DESGfrom HTSEqcount command then summarized result was executed in DESeq. Now I would like to know I have 50 normal and 50 cancer same sample numbers. How I find differentially expresssed genes in these two conditions.

condition <- cc('C1','C2'.....so on ,'N1','N2', and so on) file_list <- list.files(path = directory, pattern ="*.bam.count") sampleFiles <- c(file_list)

sampleTable <- data.frame(sampleName = sampleFiles, fileName = sampleFiles, condition = condition)

ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory2, design =~ condition) Warning message: In DESeqDataSet(se, design = design, ignoreRank) : some variables in design formula are characters, converting to factors

ddsHTSeq class: DESeqDataSet dim: 47051 100 metadata(1): version assays(1): counts rownames(47051): A1BG A1BG-AS1 ... ZZZ3 bA395L14.12 rowData names(0): colnames(100): C1.bam.count C10.bam.count ... N8.bam.count N9.bam.count colData names(1): condition

ddsHTSeq <- DESeq(ddsHTSeq)

estimating size factors estimating dispersions Error in checkForExperimentalReplicates(object, modelMatrix) :

The design matrix has the same number of samples and coefficients to fit,

so estimation of dispersion is not possible. Treating samples as replicates was deprecated in v1.20 and no longer supported since v1.

When I mentioned design =~ 1 in

ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory2, design =~ 1) Warning message: In DESeqDataSet(se, design = design, ignoreRank) : some variables in design formula are characters, converting to factors

ddsHTSeq class: DESeqDataSet dim: 47051 100 metadata(1): version assays(1): counts rownames(47051): A1BG A1BG-AS1 ... ZZZ3 bA395L14.12 rowData names(0): colnames(100): C1.bam.count C10.bam.count ... N8.bam.count N9.bam.count colData names(1): condition

ddsHTSeq <- DESeq(ddsHTSeq)

estimating size factors estimating dispersions ....................give the result.

Now please guide me how to differentiate among two samples from same organism. I will be heartily thankful to you.

ADD COMMENT
0
Entering edit mode

I strongly recommend that you stop what you are doing and find a tutorial and go through that tutorial data step by step. You will probably figure out all your questions by doing that.

ADD REPLY
0
Entering edit mode

I am new and not getting clear thing from tutorial. So I posted here. If you can tell me, I will be heartily thankful to you. Where I am doing error or something missing. As per my understanding. I am doing something wrong in condition.

ADD REPLY
0
Entering edit mode

If you can't go through a tutorial on your own, you need to find someone in person to guide you through it. A Q&A board like this isn't meant to spoon-feed you all the basics you need to understand in order to troubleshoot your problems.

ADD REPLY

Login before adding your answer.

Traffic: 2264 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6