Question

Problem using rsem function for validate sam file

0

Entering edit mode

8.8 years ago

pier-luc.dudemaine • 0

I'm kind of new to rna-seq and I'd like to get tpm values for the transcripts in my study. I'm trying to use RSEM, but I can't get it to work with my .bam files.

I tried rsem-sam-validator on sorted by name .bam files (using samtools), but the following error is generated.

The two mates of paired-end read HWI-ST909:318:C5CEEACXX:8:1101:1075:79829 are not adjacent!

I also tried convert-sam-for-rsem, but only generating similar errors:

Number of first and second mates in read HWI-ST909:318:C5CEEACXX:8:1101:10002:21523's partial alignments (at most one mate is aligned) are not matched!

Is anyone have an idea how I can solve that?

rsem rna-seq • 4.0k views

ADD COMMENT • link updated 3.9 years ago by Ram 45k • written 8.8 years ago by pier-luc.dudemaine • 0

0

Entering edit mode

Which aligner did you use? It sounds like it has a bug.

ADD REPLY • link 8.8 years ago by Devon Ryan 105k

0

Entering edit mode

TopHat was used, version 2.0.11

ADD REPLY • link 8.8 years ago by pier-luc.dudemaine • 0

0

Entering edit mode

I ran bamcheck trying to see if the bam file is wrong, here's what I got

SN      raw total sequences:    65160173
SN      filtered sequences:     0
SN      sequences:      65160173
SN      is paired:      1
SN      is sorted:      0
SN      1st fragments:  32608761
SN      last fragments: 32551412
SN      reads mapped:   65160173
SN      reads unmapped: 0
SN      reads unpaired: 1191483
SN      reads paired:   63968690
SN      reads duplicated:       0
SN      reads MQ0:      1582163
SN      reads QC failed:        0
SN      non-primary alignments: 3887170
SN      total length:   6461121648
SN      bases mapped:   6461121648
SN      bases mapped (cigar):   6461121648

ADD REPLY • link updated 3.9 years ago by Ram 45k • written 8.8 years ago by pier-luc.dudemaine • 0

0

Entering edit mode

I was wondering if you used tophat. You'll want to instead use bowtie2 or STAR. Also, you need to align against the transcriptome and not the genome.

ADD REPLY • link 8.8 years ago by Devon Ryan 105k

0

Entering edit mode

Actually I have some other files that has been aligned using STAR and the problem is the same! So maybe aligning to the transcriptome would solve my problem, but this raise one question, what if I want to analyze transcripts that are not in the transcriptome?

ADD REPLY • link 8.8 years ago by pier-luc.dudemaine • 0

0

Entering edit mode

RSEM only works with transcriptome alignments. If you're interested in something not in a transcriptome, then add it.

ADD REPLY • link 8.8 years ago by Devon Ryan 105k

0

Entering edit mode

As said, I'm not so old in the field, so tell me if I'm wrong. What you're suggesting is to add my transcripts that are not in the transcriptome to the transcriptome .gtf file and realing against this new .gtf?

Do you know why RSEM only take transcriptome aligned .bam files? What's the rationale behind that? I can't find it in the documentation.

By the way thanks a lot for your time!

ADD REPLY • link 8.8 years ago by pier-luc.dudemaine • 0

1

Entering edit mode

You need a fasta file containing the sequence of each transcript, you won't use a GTF file at all. RSEM and similar tools only take transcriptome alignments because it's vastly easier to implement things that way.

ADD REPLY • link 8.8 years ago by Devon Ryan 105k