I'm trying the tuxedo protocol as well as analysis using DESeq2, I wanted to see if I can see in the difference of DE genes. DESeq2 uses something called count matrix ,which I generated using both ht-seq and featurecounts .Now I want to know how do I transform those counts to make it usable for DESeq2 analysis
I see the count table from ht-seq and featurecount whats the difference between both the output.? for ht-seq I see a long list of ensemble ID with the counts where as for featurecount there is lot more information .
I have two 4 samples Control vs Test with replicate of each of them ,so I have the all together 4 count table.
So how can i use those counts into my DESeq2 ?
Any help and suggestion would be highly appreciated.
With featureCounts you get a matrix directly (cutting those five columns is not bad). Useful if you have a lot of samples.
okay so if I remove the first 5 columns then I can use that as input to DESeq2 , I will do that But i have 4 featurecount data sets so can i just remove first 5 columns and simply join them?
i would like to do the same with HTseqCounts data which gives just enslebleID and count value so how can I put all my samples in a single count matrix ?
I would use
cut
andpaste
in the terminal. In this case, I read the 16 counts files in the counts directory (I have 16 conditions), retrieve the useful columns and save the final matrix that I can load into R>DESeq.Note that the
paste
process the files in alphanumerical order, which may not be the desired order. Reorering the columns in R afterward might be needed.thanks for the fast response but why 16? shouldn;t it be all together 4 I have count matrix 2 for wild type and 2 for my test .So that makes it 4 !! or did I do something wrong ?
I just pasted code I used in a project with 16 samples. U can adapt it for 4 conditions.
Since one is in R for DESeq2 would it not be better to do it there (cutting the columns out).
There are many different ways to do such basic data manipulation. I like to do it in the terminal when I can because it can be faster, requires less memory and can be syntaxically shorter. One alternative using only R would require to read all the files, save them into memory, merge them, then cut the unwanted columns and clear the original tables from memory. For me it is less efficient (especially with 16 samples), but it also works.
featurecounts accepts multiple files at the same time leading to one single file with the count matrix.
"Since one is in R for DESeq2 would it not be better to do it there (cutting the columns out)." please do explain why because I haven't used "DESeq2" before .
You can provide multiple BAM files to featureCounts to get the full matrix (no need to run it on files independently). I was saying that the annotation columns can be removed after bringing the matrix in to R.
I have output from "featureCounts" as well but I did't use multiple bam files unfortunately i ran individual samples
featureCounts is fast enough that you can re-run it easily with all files. Provide the file names in the order you want the columns to be in so there is less stuff you need to fiddle with afterwards.
i will give it a try
That can be a good advice except if your bam file is made of paired-end reads and is not name-sorted. FeatureCounts can sort the bam files automatically but it can be very time-consuming.
yes i have paired ends bam files which I obtained after running tophat
yeah it has options for multiple files i m doing it again..
but how can multiple bam files would give a single count table since I have to define the output directory for each sample ...
Counting is happening on completed aligned BAM files. You can provide relative paths.
featureCounts Sampl1/accepted_hits.bam Sampl2/accepted_hits.bam Sampl3/accepted_hits.bam etc
(not a real command).