How to input data for DESeq2 from individual HTSeq count?
3
4
Entering edit mode
3.8 years ago
sudu87 ▴ 40

I am comparing the gene expression of 2 bacteria under 1 condition. I have now the count tables for 3 tech. replicates for each bacteria.

Bacteria1_1.count
Bacteria1_2.count
Bacteria1_3.count


...same for the other bacteria.

These files look like this:

gene1 10000
gene2 500
gene3 0
gene4 5000


I want to use DESeq2 for differential gene expression analysis. But I cannot figure out how to properly execute the DESeqDataSetFromHTSeqCount() command with this type of data.

Is there another intermediate step to add ?

RNA-Seq HTSeq rna-seq deseq deseq2 • 9.1k views
8
Entering edit mode
3.8 years ago
ZZzzzzhong ▴ 240
directory <- "/path/to/your/files/"


directory is where your htseq-count output files are located.

sampleFiles <- grep("Bacteria",list.files(directory),value=TRUE)


samplesFiles is a variable which points to your htseq-count output files,

condition <- c('Bacteria1','Bacteria1','Bacteria1','Bacteria2','Bacteria2','Bacteria2')


One for one for your sample type

sampleTable <- data.frame(sampleName = sampleFiles,
fileName = sampleFiles,
condition = condition)
library("DESeq2")
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
directory = directory,
design= ~ condition)

0
Entering edit mode

Thank you so much for this.

Sorry for these stupid questions but I have one more issue in:

sampleFiles <- grep("Bacteria",list.files(directory),value=TRUE)


I have 2 different bacteria names as the filenames for the .count files. For example, "cowan" and "isolate" are names of the bacteria. I tried grep-ing both at a time but it doesn't work. How can I can solve this?

Thanks a ton,

Sudip

2
Entering edit mode

Just like the variable condition

sampleFiles <- c('cowan1','cowan2','cowan3','isolate1','isolate2','isolate3')


remember sampleFiles correspond with condition

0
Entering edit mode

Hi ZZzzzzhong I am trying to follow the method you have suggested, however, I am getting an error " Error in data.frame(sampleName = sampleFiles, fileName = sampleFiles, : arguments imply differing number of rows: 0, 4".

I have checked the number of rows in all individual files , and they are same.

Here is my script

directory <- "C/RNA SEQ adv cgrp/IWAT" sampleFiles <- grep("COUNT FILES",list.files(directory),value=TRUE) condition <- c('237 COUNT FILES','264 COUNT FILES','267 COUNT FILES','265 COUNT FILES') sampleTable <- data.frame(sampleName = sampleFiles, + fileName = sampleFiles, + condition = condition)

0
Entering edit mode

ddsHTSeq <- DESeq(ddsHTSeq) estimating size factors estimating dispersions Error in checkForExperimentalReplicates(object, modelMatrix) :

The design matrix has the same number of samples and coefficients to fit,


so estimation of dispersion is not possible. Treating samples as replicates was deprecated in v1.20 and no longer supported since v1.

0
Entering edit mode

When I am executing :

3
Entering edit mode
3.8 years ago
poojasethiya ▴ 120

You can use following function to run DESeq2 on htseq-count output.

deseq_from_htseqcount.R

~ Pooja

0
Entering edit mode
13 months ago
Nai ▴ 20

DESGfrom HTSEqcount command then summarized result was executed in DESeq. Now I would like to know I have 50 normal and 50 cancer same sample numbers. How I find differentially expresssed genes in these two conditions.

condition <- cc('C1','C2'.....so on ,'N1','N2', and so on) file_list <- list.files(path = directory, pattern ="*.bam.count") sampleFiles <- c(file_list)

sampleTable <- data.frame(sampleName = sampleFiles, fileName = sampleFiles, condition = condition)

ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory2, design =~ condition) Warning message: In DESeqDataSet(se, design = design, ignoreRank) : some variables in design formula are characters, converting to factors

ddsHTSeq class: DESeqDataSet dim: 47051 100 metadata(1): version assays(1): counts rownames(47051): A1BG A1BG-AS1 ... ZZZ3 bA395L14.12 rowData names(0): colnames(100): C1.bam.count C10.bam.count ... N8.bam.count N9.bam.count colData names(1): condition

ddsHTSeq <- DESeq(ddsHTSeq)

estimating size factors estimating dispersions Error in checkForExperimentalReplicates(object, modelMatrix) :

The design matrix has the same number of samples and coefficients to fit,


so estimation of dispersion is not possible. Treating samples as replicates was deprecated in v1.20 and no longer supported since v1.

When I mentioned design =~ 1 in

ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory2, design =~ 1) Warning message: In DESeqDataSet(se, design = design, ignoreRank) : some variables in design formula are characters, converting to factors

ddsHTSeq class: DESeqDataSet dim: 47051 100 metadata(1): version assays(1): counts rownames(47051): A1BG A1BG-AS1 ... ZZZ3 bA395L14.12 rowData names(0): colnames(100): C1.bam.count C10.bam.count ... N8.bam.count N9.bam.count colData names(1): condition

ddsHTSeq <- DESeq(ddsHTSeq)

estimating size factors estimating dispersions ....................give the result.

Now please guide me how to differentiate among two samples from same organism. I will be heartily thankful to you.

0
Entering edit mode

I strongly recommend that you stop what you are doing and find a tutorial and go through that tutorial data step by step. You will probably figure out all your questions by doing that.

0
Entering edit mode

I am new and not getting clear thing from tutorial. So I posted here. If you can tell me, I will be heartily thankful to you. Where I am doing error or something missing. As per my understanding. I am doing something wrong in condition.

0
Entering edit mode

If you can't go through a tutorial on your own, you need to find someone in person to guide you through it. A Q&A board like this isn't meant to spoon-feed you all the basics you need to understand in order to troubleshoot your problems.