Entering edit mode
7.6 years ago
I'm kind of new to rna-seq and I'd like to get tpm values for the transcripts in my study. I'm trying to use RSEM, but I can't get it to work with my .bam files.
I tried rsem-sam-validator on sorted by name .bam files (using samtools), but the following error is generated.
The two mates of paired-end read HWI-ST909:318:C5CEEACXX:8:1101:1075:79829 are not adjacent!
I also tried convert-sam-for-rsem
, but only generating similar errors:
Number of first and second mates in read HWI-ST909:318:C5CEEACXX:8:1101:10002:21523's partial alignments (at most one mate is aligned) are not matched!
Is anyone have an idea how I can solve that?
Which aligner did you use? It sounds like it has a bug.
TopHat was used, version 2.0.11
I ran
bamcheck
trying to see if the bam file is wrong, here's what I gotI was wondering if you used tophat. You'll want to instead use bowtie2 or STAR. Also, you need to align against the transcriptome and not the genome.
Actually I have some other files that has been aligned using STAR and the problem is the same! So maybe aligning to the transcriptome would solve my problem, but this raise one question, what if I want to analyze transcripts that are not in the transcriptome?
RSEM only works with transcriptome alignments. If you're interested in something not in a transcriptome, then add it.
As said, I'm not so old in the field, so tell me if I'm wrong. What you're suggesting is to add my transcripts that are not in the transcriptome to the transcriptome .gtf file and realing against this new .gtf?
Do you know why RSEM only take transcriptome aligned .bam files? What's the rationale behind that? I can't find it in the documentation.
By the way thanks a lot for your time!
You need a fasta file containing the sequence of each transcript, you won't use a GTF file at all. RSEM and similar tools only take transcriptome alignments because it's vastly easier to implement things that way.