Question: Problem using rsem function for validate sam file
0
gravatar for pier-luc.dudemaine
2.6 years ago by
pier-luc.dudemaine0 wrote:

I'm kind of new to rna-seq and I'd like to get tpm values for the transcripts in my study. I'm trying to use RSEM, but I can't get it to work with my .bam files.

I tried rsem-sam-validator on sorted by name .bam files (using samtools), but the following error is generated.

The two mates of paired-end read HWI-ST909:318:C5CEEACXX:8:1101:1075:79829 are not adjacent!

I also tried the convert-sam-for-rsem, but only generating similar errors : Number of first and second mates in read HWI-ST909:318:C5CEEACXX:8:1101:10002:21523's partial alignments (at most one mate is aligned) are not matched!.

Is anyone have an idea how I can solve that?

rsem rna-seq • 1.3k views
ADD COMMENTlink modified 2.5 years ago by Biostar ♦♦ 20 • written 2.6 years ago by pier-luc.dudemaine0

Which aligner did you use? It sounds like it has a bug.

ADD REPLYlink written 2.6 years ago by Devon Ryan89k

TopHat was used, version 2.0.11

ADD REPLYlink written 2.6 years ago by pier-luc.dudemaine0

I runned bamcheck trying to see if the bam file is wrong, here's what I got

SN raw total sequences: 65160173 SN filtered sequences: 0 SN sequences: 65160173 SN is paired: 1 SN is sorted: 0 SN 1st fragments: 32608761 SN last fragments: 32551412 SN reads mapped: 65160173 SN reads unmapped: 0 SN reads unpaired: 1191483 SN reads paired: 63968690 SN reads duplicated: 0 SN reads MQ0: 1582163 SN reads QC failed: 0 SN non-primary alignments: 3887170 SN total length: 6461121648 SN bases mapped: 6461121648 SN bases mapped (cigar): 6461121648

ADD REPLYlink written 2.6 years ago by pier-luc.dudemaine0

I was wondering if you used tophat. You'll want to instead use bowtie2 or STAR. Also, you need to align against the transcriptome and not the genome.

ADD REPLYlink written 2.6 years ago by Devon Ryan89k

Actually I have some other files that has been aligned using STAR and the problem is the same! So maybe aligning to the transcriptome would solve my problem, but this raise one question, what if I want to analyze transcripts that are not in the transcriptome?

ADD REPLYlink written 2.6 years ago by pier-luc.dudemaine0

RSEM only works with transcriptome alignments. If you're interested in something not in a transcriptome, then add it.

ADD REPLYlink written 2.6 years ago by Devon Ryan89k

As said, I'm not so old in the field, so tell me if I'm wrong. What you're suggesting is to add my transcripts that are not in the transcriptome to the transcriptome .gtf file and realing against this new .gtf?

Do you know why RSEM only take transcriptome aligned .bam files? What's the rationale behind that? I can't find it in the documentation.

By the way thanks a lot for your time!

ADD REPLYlink written 2.6 years ago by pier-luc.dudemaine0
1

You need a fasta file containing the sequence of each transcript, you won't use a GTF file at all. RSEM and similar tools only take transcriptome alignments because it's vastly easier to implement things that way.

ADD REPLYlink written 2.6 years ago by Devon Ryan89k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 700 users visited in the last hour