Question: converting RSEM bam output into sam files for DEseq?
0
gravatar for Sepd
13 months ago by
Sepd0
Sepd0 wrote:

Hi everyone,

I have used RSEM to calculate expression values and then used ebseq to measure differential expression. Eb seq was not helpful in throwing out transcripts that were not consistent through out replicates so now I am trying to use DEseq. I am planning on converting my transcript.bam files to sam using samtools and then inputting this into DEseq. Does this make sense and is it likely to work?

Thank you

rna-seq rsem deseq bam sam • 913 views
ADD COMMENTlink modified 13 months ago • written 13 months ago by Sepd0
3

DESeq2 doesn't use sam format but uses read counts.

ADD REPLYlink written 13 months ago by WouterDeCoster38k
3

The input to DEseq is a (gene) counts table and not a sam file as far as I know.

In one of the output files of RSEM you should be able to find something read count like which could in theory serve as input for DEseq, however it is likely not the most appropriate one to be used as input

ADD REPLYlink modified 13 months ago • written 13 months ago by lieven.sterck4.5k

I have a gene.results but they should be normalized read counts. Is there anyway I can get this from converting the BAM file into something? I just don't have time to start over.

ADD REPLYlink written 13 months ago by Sepd0
3

I think the fifth column in the gene.results file is the 'expected_count' . This should be a NON normalized count (so diff from TPM or FKPM) . They are however not raw counts! (= what DEseq expects as input).

So if you really do not want to start over, that's probably your best option. However I must stress that starting over is likely the best approach!! (one you should consider doing ;-) )

You should have a bam file of your aligned reads, no. So it's just a matter of using a different tool (eg HTSeq) to get to the counts table, it's not that you will have to remap all your reads. The starting over should only take several minutes to perhaps an hour or so. Thus, do consider in stead of working with non-optimal input

ADD REPLYlink modified 13 months ago • written 13 months ago by lieven.sterck4.5k

Thank you for your help. Can you explain how to do this? Do I use HTSeq and my bam files from RSEM? Also is DEseq only ran on Rstudio on my local computer? Is there anyway to do it on HPC? Sorry I am a beginner.

You should have a bam file of your aligned reads, no. So it's just a matter of using a different tool (eg HTSeq) to get to the counts table, it's not that you will have to remap all your reads. The starting over should only take several minutes to perhaps an hour or so. Thus, do consider in stead of working with non-optimal input

ADD REPLYlink written 13 months ago by Sepd0
1

Yes indeed, you should be able to use the same input BAM file as you used for RSEM (perhaps with diff sorting, that you might need to check). What software did you use to create the BAM file?

No, you can get htseq also a linux distribution, so then you are able to run it in on HPC as well

EDIT: ok, you meant DEseq, my bad :-) , but anyway yes, you can run R and thus also DESeq in a linux environment

ADD REPLYlink modified 13 months ago • written 13 months ago by lieven.sterck4.5k
1

Do I use HTSeq and my bam files from RSEM?

In fact, no, the same bam would not work, as RSEM uses as input bam with reads aligned to the transcriptome, and HTSeq uses as input bam with reads aligned to the genome.

One could fool HTSeq and use the transcriptome-aligned bam as input, but reads mapping to several isoforms of the same gene would not counted, because HTSeq discards multi-mappers.

Also is DEseq only ran on Rstudio on my local computer? Is there anyway to do it on HPC?

No need to run DESeq2 on HPC, unless you have hundreds or thousands of samples. A regular laptop / desktop will handle the analysis easily.

ADD REPLYlink modified 13 months ago • written 13 months ago by h.mon24k

RSEM gave me a transcript.bam output file. I was just reviewing my work and I remembered I am trying to convert these bam files into sam for input into htseq because of the sentence below. Will this bam output from RSEM be OK to convert into sam and input into htseq? "Here I used BWA to align the reads to my reference and generate SAM files that are then processed by HTseq count."

ADD REPLYlink written 13 months ago by Sepd0

I wanted to thank everyone who responded. I decided to use tximport and then DEseq.

ADD REPLYlink written 13 months ago by Sepd0

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. Upvote|Bookmark|Accept

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

ADD REPLYlink modified 13 months ago • written 13 months ago by WouterDeCoster38k
2
gravatar for h.mon
13 months ago by
h.mon24k
Brazil
h.mon24k wrote:

I am planning on converting my transcript.bam files to sam using samtools and then inputting this into DEseq. Does this make sense and is it likely to work?

As others explained, no, it will not work.

RSEM uses a BAM of reads aligned to a reference transcriptome, not to a reference genome. HTSeq needs a bam aligned to the genome, and the genome annotation in gtf format. If you have a transcript to gene mapping, you can use the tximport R package to summarize RSEM transcript counts into gene counts, then use these counts with DESeq2.

edit:

I have a gene.results

If this gene.results is part of RSEM output, I think it is also appropriate (maybe rounding it to integers), and should be really close to the tximport counts.

ADD COMMENTlink modified 13 months ago • written 13 months ago by h.mon24k

by transcript to gene mapping do you mean the transcript.bam output of RSEM?

ADD REPLYlink written 13 months ago by Sepd0
1

No, it is just a table of which transcripts correspond to which genes, something like:

transcript_01    gene_01
transcript_02    gene_01
transcript_03    gene_02
transcript_04    gene_03
transcript_05    gene_03

Check the correct format at the tximport manual.

ADD REPLYlink written 13 months ago by h.mon24k
1
gravatar for WouterDeCoster
13 months ago by
Belgium
WouterDeCoster38k wrote:

See the tximport vignette on how to use RSEM with DESeq2.

ADD COMMENTlink written 13 months ago by WouterDeCoster38k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1158 users visited in the last hour