Question: How to use htseq-count with several samples ?
0
gravatar for scheme4193
4 months ago by
scheme419340
scheme419340 wrote:

Does anyone know how to use htseq-count with several samples ?

We can use htseq-count like : htseq-count sample1.sam reference.gtf > result.count.txt

We can get sample1's count data by above command. But, it is usual that we have more than two sample. So, we have to run htseq-count for each sample's sam file. Do many people combine result matrix after running each htseq-count by sample ? or Can we make expression matrix with several samples at the same time ?

Also, I think there is some difference between samples like total expression amount or reads number. How many people do any normalization or correction between samples ?

Thank you.

rna-seq next-gen gene • 276 views
ADD COMMENTlink modified 4 months ago by WouterDeCoster42k • written 4 months ago by scheme419340
2

You have to run it separately for each sample. One you get the counts you can use R to create a unique matrix as

res <- mclapply(dir(pattern="*.counts", full.names=TRUE), function(fil){
                      read.delim(fil, header=FALSE, stringsAsFactors=FALSE)
                   }, mc.cores=16)

names(res) <- gsub("*.counts", "" , dir(pattern="*.counts"))

#Then we extract the additional info that HTSeq writes at the end of every file detailing 
addInfo <- c("__no_feature","__ambiguous",
             "__too_low_aQual","__not_aligned",
             "__alignment_not_unique")

Hope this help!

ADD REPLYlink written 4 months ago by Lila M 800

Sorry, last sentence is wrong.

This is correct.

How do many people do normalization or correction between samples ?

ADD REPLYlink written 4 months ago by scheme419340

You can edit your post and correct that sentence.

ADD REPLYlink written 4 months ago by WouterDeCoster42k
5
gravatar for WouterDeCoster
4 months ago by
Belgium
WouterDeCoster42k wrote:

If you would use htseq count you would run it separately for each sample. Probably a better tool for this would be featureCounts.

If you use htseq count you can import that directly into DESeq2 (you did not tell us what your goal is, but I'll assume differential expression analysis). See here in the documentation.

Also, I think there is some difference between samples like total expression amount or reads number. How many people do any normalization or correction between samples ?

Again, we should know what you want to achieve, but I would say everyone should use normalization. But if you go on and use DESeq2 then you don't have to worry about it, as DESeq2 will take care of normalizing your samples.

ADD COMMENTlink written 4 months ago by WouterDeCoster42k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1717 users visited in the last hour