Problem using rsem function for validate sam file
0
0
Entering edit mode
5.2 years ago

I'm kind of new to rna-seq and I'd like to get tpm values for the transcripts in my study. I'm trying to use RSEM, but I can't get it to work with my .bam files.

I tried rsem-sam-validator on sorted by name .bam files (using samtools), but the following error is generated.

The two mates of paired-end read HWI-ST909:318:C5CEEACXX:8:1101:1075:79829 are not adjacent!

I also tried convert-sam-for-rsem, but only generating similar errors:

Number of first and second mates in read HWI-ST909:318:C5CEEACXX:8:1101:10002:21523's partial alignments (at most one mate is aligned) are not matched!

Is anyone have an idea how I can solve that?

rsem rna-seq • 2.4k views
ADD COMMENT
0
Entering edit mode

Which aligner did you use? It sounds like it has a bug.

ADD REPLY
0
Entering edit mode

TopHat was used, version 2.0.11

ADD REPLY
0
Entering edit mode

I ran bamcheck trying to see if the bam file is wrong, here's what I got

SN      raw total sequences:    65160173
SN      filtered sequences:     0
SN      sequences:      65160173
SN      is paired:      1
SN      is sorted:      0
SN      1st fragments:  32608761
SN      last fragments: 32551412
SN      reads mapped:   65160173
SN      reads unmapped: 0
SN      reads unpaired: 1191483
SN      reads paired:   63968690
SN      reads duplicated:       0
SN      reads MQ0:      1582163
SN      reads QC failed:        0
SN      non-primary alignments: 3887170
SN      total length:   6461121648
SN      bases mapped:   6461121648
SN      bases mapped (cigar):   6461121648
ADD REPLY
0
Entering edit mode

I was wondering if you used tophat. You'll want to instead use bowtie2 or STAR. Also, you need to align against the transcriptome and not the genome.

ADD REPLY
0
Entering edit mode

Actually I have some other files that has been aligned using STAR and the problem is the same! So maybe aligning to the transcriptome would solve my problem, but this raise one question, what if I want to analyze transcripts that are not in the transcriptome?

ADD REPLY
0
Entering edit mode

RSEM only works with transcriptome alignments. If you're interested in something not in a transcriptome, then add it.

ADD REPLY
0
Entering edit mode

As said, I'm not so old in the field, so tell me if I'm wrong. What you're suggesting is to add my transcripts that are not in the transcriptome to the transcriptome .gtf file and realing against this new .gtf?

Do you know why RSEM only take transcriptome aligned .bam files? What's the rationale behind that? I can't find it in the documentation.

By the way thanks a lot for your time!

ADD REPLY
1
Entering edit mode

You need a fasta file containing the sequence of each transcript, you won't use a GTF file at all. RSEM and similar tools only take transcriptome alignments because it's vastly easier to implement things that way.

ADD REPLY

Login before adding your answer.

Traffic: 2072 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6