Question: DESeq2 input for DE analysis
1
gravatar for krushnach80
2.4 years ago by
krushnach80530
krushnach80530 wrote:

I'm trying the tuxedo protocol as well as analysis using DESeq2, I wanted to see if I can see in the difference of DE genes. DESeq2 uses something called count matrix ,which I generated using both ht-seq and featurecounts .Now I want to know how do I transform those counts to make it usable for DESeq2 analysis

I see the count table from ht-seq and featurecount whats the difference between both the output.? for ht-seq I see a long list of ensemble ID with the counts where as for featurecount there is lot more information .

I have two 4 samples Control vs Test with replicate of each of them ,so I have the all together 4 count table.

So how can i use those counts into my DESeq2 ?

Any help and suggestion would be highly appreciated.

rna-seq R • 1.8k views
ADD COMMENTlink modified 2.4 years ago by WouterDeCoster40k • written 2.4 years ago by krushnach80530
6
gravatar for Carlo Yague
2.4 years ago by
Carlo Yague4.6k
Belgium
Carlo Yague4.6k wrote:

The output of both HT-seq and featurecounts can almost be fed directly in DESeq2.

  • for FeatureCounts :

Output looks like :

Geneid  Chr Start   End Strand  Length  cond1   cond2
SPBC460.05  I   16470   18062   +   1593    1   12  24
SPBC460.02c II;II   8856;9651   9365;9803   -;- 663 329
SPAC212.11  I   1   5662        -   5662    0   0   0

First you need to remove 5 firsts columns and name the rows as gene names

    cond1   cond2
SPBC460.05  12  24
SPBC460.02c 663 329
SPAC212.11  0   0

then just use the DESeqDataSetFromMatrix function of DESeq2 to convert it.

  • for HTseqCounts : see comments below
ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by Carlo Yague4.6k
1

With featureCounts you get a matrix directly (cutting those five columns is not bad). Useful if you have a lot of samples.

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by genomax70k

okay so if I remove the first 5 columns then I can use that as input to DESeq2 , I will do that But i have 4 featurecount data sets so can i just remove first 5 columns and simply join them?

i would like to do the same with HTseqCounts data which gives just enslebleID and count value so how can I put all my samples in a single count matrix ?

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by krushnach80530
1

I would use cut and paste in the terminal. In this case, I read the 16 counts files in the counts directory (I have 16 conditions), retrieve the useful columns and save the final matrix that I can load into R>DESeq.

paste counts/* | cut -f 1,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32 > counts/merged_counts.mat

Note that the paste process the files in alphanumerical order, which may not be the desired order. Reorering the columns in R afterward might be needed.

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by Carlo Yague4.6k

thanks for the fast response but why 16? shouldn;t it be all together 4 I have count matrix 2 for wild type and 2 for my test .So that makes it 4 !! or did I do something wrong ?

ADD REPLYlink written 2.4 years ago by krushnach80530
1

I just pasted code I used in a project with 16 samples. U can adapt it for 4 conditions.

paste cond1 cond2 cond3 cond4 | cut -f 1,2,4,6,8 > merged.txt
ADD REPLYlink written 2.4 years ago by Carlo Yague4.6k

Since one is in R for DESeq2 would it not be better to do it there (cutting the columns out).

ADD REPLYlink written 2.4 years ago by genomax70k
1

There are many different ways to do such basic data manipulation. I like to do it in the terminal when I can because it can be faster, requires less memory and can be syntaxically shorter. One alternative using only R would require to read all the files, save them into memory, merge them, then cut the unwanted columns and clear the original tables from memory. For me it is less efficient (especially with 16 samples), but it also works.

ADD REPLYlink written 2.4 years ago by Carlo Yague4.6k
2

featurecounts accepts multiple files at the same time leading to one single file with the count matrix.

ADD REPLYlink written 2.4 years ago by genomax70k

"Since one is in R for DESeq2 would it not be better to do it there (cutting the columns out)." please do explain why because I haven't used "DESeq2" before .

ADD REPLYlink written 2.4 years ago by krushnach80530
1

You can provide multiple BAM files to featureCounts to get the full matrix (no need to run it on files independently). I was saying that the annotation columns can be removed after bringing the matrix in to R.

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by genomax70k

I have output from "featureCounts" as well but I did't use multiple bam files unfortunately i ran individual samples

ADD REPLYlink written 2.4 years ago by krushnach80530
2

featureCounts is fast enough that you can re-run it easily with all files. Provide the file names in the order you want the columns to be in so there is less stuff you need to fiddle with afterwards.

ADD REPLYlink written 2.4 years ago by genomax70k

i will give it a try

ADD REPLYlink written 2.4 years ago by krushnach80530

That can be a good advice except if your bam file is made of paired-end reads and is not name-sorted. FeatureCounts can sort the bam files automatically but it can be very time-consuming.

ADD REPLYlink written 2.4 years ago by Carlo Yague4.6k

yes i have paired ends bam files which I obtained after running tophat

ADD REPLYlink written 2.4 years ago by krushnach80530

yeah it has options for multiple files i m doing it again..

ADD REPLYlink written 2.4 years ago by krushnach80530

but how can multiple bam files would give a single count table since I have to define the output directory for each sample ...

ADD REPLYlink written 2.4 years ago by krushnach80530
2

Counting is happening on completed aligned BAM files. You can provide relative paths. featureCounts Sampl1/accepted_hits.bam Sampl2/accepted_hits.bam Sampl3/accepted_hits.bam etc (not a real command).

ADD REPLYlink written 2.4 years ago by genomax70k
1
gravatar for WouterDeCoster
2.4 years ago by
Belgium
WouterDeCoster40k wrote:

You can also use featureCounts directly from R, which will give you an R object which you can use directly as input to DESeq2 and similar tools, see the Rsubread package. This avoids tampering around with inputfiles.

ADD COMMENTlink written 2.4 years ago by WouterDeCoster40k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 969 users visited in the last hour