DESeq2 input for DE analysis
2
2
Entering edit mode
7.2 years ago
1769mkc ★ 1.2k

I'm trying the tuxedo protocol as well as analysis using DESeq2, I wanted to see if I can see in the difference of DE genes. DESeq2 uses something called count matrix ,which I generated using both ht-seq and featurecounts .Now I want to know how do I transform those counts to make it usable for DESeq2 analysis

I see the count table from ht-seq and featurecount whats the difference between both the output.? for ht-seq I see a long list of ensemble ID with the counts where as for featurecount there is lot more information .

I have two 4 samples Control vs Test with replicate of each of them ,so I have the all together 4 count table.

So how can i use those counts into my DESeq2 ?

Any help and suggestion would be highly appreciated.

R RNA-Seq • 5.2k views
ADD COMMENT
7
Entering edit mode
7.2 years ago

The output of both HT-seq and featurecounts can almost be fed directly in DESeq2.

  • for FeatureCounts :

Output looks like :

Geneid  Chr Start   End Strand  Length  cond1   cond2
SPBC460.05  I   16470   18062   +   1593    1   12  24
SPBC460.02c II;II   8856;9651   9365;9803   -;- 663 329
SPAC212.11  I   1   5662        -   5662    0   0   0

First you need to remove 5 firsts columns and name the rows as gene names

    cond1   cond2
SPBC460.05  12  24
SPBC460.02c 663 329
SPAC212.11  0   0

then just use the DESeqDataSetFromMatrix function of DESeq2 to convert it.

  • for HTseqCounts : see comments below
ADD COMMENT
1
Entering edit mode

With featureCounts you get a matrix directly (cutting those five columns is not bad). Useful if you have a lot of samples.

ADD REPLY
0
Entering edit mode

okay so if I remove the first 5 columns then I can use that as input to DESeq2 , I will do that But i have 4 featurecount data sets so can i just remove first 5 columns and simply join them?

i would like to do the same with HTseqCounts data which gives just enslebleID and count value so how can I put all my samples in a single count matrix ?

ADD REPLY
1
Entering edit mode

I would use cut and paste in the terminal. In this case, I read the 16 counts files in the counts directory (I have 16 conditions), retrieve the useful columns and save the final matrix that I can load into R>DESeq.

paste counts/* | cut -f 1,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32 > counts/merged_counts.mat

Note that the paste process the files in alphanumerical order, which may not be the desired order. Reorering the columns in R afterward might be needed.

ADD REPLY
0
Entering edit mode

thanks for the fast response but why 16? shouldn;t it be all together 4 I have count matrix 2 for wild type and 2 for my test .So that makes it 4 !! or did I do something wrong ?

ADD REPLY
1
Entering edit mode

I just pasted code I used in a project with 16 samples. U can adapt it for 4 conditions.

paste cond1 cond2 cond3 cond4 | cut -f 1,2,4,6,8 > merged.txt
ADD REPLY
0
Entering edit mode

Since one is in R for DESeq2 would it not be better to do it there (cutting the columns out).

ADD REPLY
1
Entering edit mode

There are many different ways to do such basic data manipulation. I like to do it in the terminal when I can because it can be faster, requires less memory and can be syntaxically shorter. One alternative using only R would require to read all the files, save them into memory, merge them, then cut the unwanted columns and clear the original tables from memory. For me it is less efficient (especially with 16 samples), but it also works.

ADD REPLY
2
Entering edit mode

featurecounts accepts multiple files at the same time leading to one single file with the count matrix.

ADD REPLY
0
Entering edit mode

"Since one is in R for DESeq2 would it not be better to do it there (cutting the columns out)." please do explain why because I haven't used "DESeq2" before .

ADD REPLY
1
Entering edit mode

You can provide multiple BAM files to featureCounts to get the full matrix (no need to run it on files independently). I was saying that the annotation columns can be removed after bringing the matrix in to R.

ADD REPLY
0
Entering edit mode

I have output from "featureCounts" as well but I did't use multiple bam files unfortunately i ran individual samples

ADD REPLY
2
Entering edit mode

featureCounts is fast enough that you can re-run it easily with all files. Provide the file names in the order you want the columns to be in so there is less stuff you need to fiddle with afterwards.

ADD REPLY
0
Entering edit mode

i will give it a try

ADD REPLY
0
Entering edit mode

That can be a good advice except if your bam file is made of paired-end reads and is not name-sorted. FeatureCounts can sort the bam files automatically but it can be very time-consuming.

ADD REPLY
0
Entering edit mode

yes i have paired ends bam files which I obtained after running tophat

ADD REPLY
0
Entering edit mode

yeah it has options for multiple files i m doing it again..

ADD REPLY
0
Entering edit mode

but how can multiple bam files would give a single count table since I have to define the output directory for each sample ...

ADD REPLY
2
Entering edit mode

Counting is happening on completed aligned BAM files. You can provide relative paths. featureCounts Sampl1/accepted_hits.bam Sampl2/accepted_hits.bam Sampl3/accepted_hits.bam etc (not a real command).

ADD REPLY
1
Entering edit mode
7.2 years ago

You can also use featureCounts directly from R, which will give you an R object which you can use directly as input to DESeq2 and similar tools, see the Rsubread package. This avoids tampering around with inputfiles.

ADD COMMENT

Login before adding your answer.

Traffic: 1569 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6