Question: How to use htseq-count with several samples ?
0
gravatar for scheme4193
10 days ago by
scheme419310
scheme419310 wrote:

Does anyone know how to use htseq-count with several samples ?

We can use htseq-count like : htseq-count sample1.sam reference.gtf > result.count.txt

We can get sample1's count data by above command. But, it is usual that we have more than two sample. So, we have to run htseq-count for each sample's sam file. Do many people combine result matrix after running each htseq-count by sample ? or Can we make expression matrix with several samples at the same time ?

Also, I think there is some difference between samples like total expression amount or reads number. How many people do any normalization or correction between samples ?

Thank you.

rna-seq next-gen gene • 122 views
ADD COMMENTlink modified 10 days ago by WouterDeCoster40k • written 10 days ago by scheme419310
1

You have to run it separately for each sample. One you get the counts you can use R to create a unique matrix as

res <- mclapply(dir(pattern="*.counts", full.names=TRUE), function(fil){
                      read.delim(fil, header=FALSE, stringsAsFactors=FALSE)
                   }, mc.cores=16)

names(res) <- gsub("*.counts", "" , dir(pattern="*.counts"))

#Then we extract the additional info that HTSeq writes at the end of every file detailing 
addInfo <- c("__no_feature","__ambiguous",
             "__too_low_aQual","__not_aligned",
             "__alignment_not_unique")

Hope this help!

ADD REPLYlink written 9 days ago by Lila M 780

Sorry, last sentence is wrong.

This is correct.

How do many people do normalization or correction between samples ?

ADD REPLYlink written 10 days ago by scheme419310

You can edit your post and correct that sentence.

ADD REPLYlink written 10 days ago by WouterDeCoster40k
2
gravatar for WouterDeCoster
10 days ago by
Belgium
WouterDeCoster40k wrote:

If you would use htseq count you would run it separately for each sample. Probably a better tool for this would be featureCounts.

If you use htseq count you can import that directly into DESeq2 (you did not tell us what your goal is, but I'll assume differential expression analysis). See here in the documentation.

Also, I think there is some difference between samples like total expression amount or reads number. How many people do any normalization or correction between samples ?

Again, we should know what you want to achieve, but I would say everyone should use normalization. But if you go on and use DESeq2 then you don't have to worry about it, as DESeq2 will take care of normalizing your samples.

ADD COMMENTlink written 10 days ago by WouterDeCoster40k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1065 users visited in the last hour