Problem using rsem function for validate sam file
0
0
Entering edit mode
5.2 years ago

I'm kind of new to rna-seq and I'd like to get tpm values for the transcripts in my study. I'm trying to use RSEM, but I can't get it to work with my .bam files.

I tried rsem-sam-validator on sorted by name .bam files (using samtools), but the following error is generated.

The two mates of paired-end read HWI-ST909:318:C5CEEACXX:8:1101:1075:79829 are not adjacent!


I also tried convert-sam-for-rsem, but only generating similar errors:

Number of first and second mates in read HWI-ST909:318:C5CEEACXX:8:1101:10002:21523's partial alignments (at most one mate is aligned) are not matched!


Is anyone have an idea how I can solve that?

rsem rna-seq • 2.4k views
0
Entering edit mode

Which aligner did you use? It sounds like it has a bug.

0
Entering edit mode

TopHat was used, version 2.0.11

0
Entering edit mode

I ran bamcheck trying to see if the bam file is wrong, here's what I got

SN      raw total sequences:    65160173
SN      filtered sequences:     0
SN      sequences:      65160173
SN      is paired:      1
SN      is sorted:      0
SN      1st fragments:  32608761
SN      last fragments: 32551412
SN      non-primary alignments: 3887170
SN      total length:   6461121648
SN      bases mapped:   6461121648
SN      bases mapped (cigar):   6461121648

0
Entering edit mode

I was wondering if you used tophat. You'll want to instead use bowtie2 or STAR. Also, you need to align against the transcriptome and not the genome.

0
Entering edit mode

Actually I have some other files that has been aligned using STAR and the problem is the same! So maybe aligning to the transcriptome would solve my problem, but this raise one question, what if I want to analyze transcripts that are not in the transcriptome?

0
Entering edit mode

RSEM only works with transcriptome alignments. If you're interested in something not in a transcriptome, then add it.

0
Entering edit mode

As said, I'm not so old in the field, so tell me if I'm wrong. What you're suggesting is to add my transcripts that are not in the transcriptome to the transcriptome .gtf file and realing against this new .gtf?

Do you know why RSEM only take transcriptome aligned .bam files? What's the rationale behind that? I can't find it in the documentation.

By the way thanks a lot for your time!

1
Entering edit mode

You need a fasta file containing the sequence of each transcript, you won't use a GTF file at all. RSEM and similar tools only take transcriptome alignments because it's vastly easier to implement things that way.