Running salmon after subsetting the bam to transcripts annotated as protein coding in the GTF
0
0
Entering edit mode
6.0 years ago
rubic ▴ 270

Hi,

I have mouse RNA-seq data (single-end stranded - reverse strand) which I STAR mapped against mm10 with gencode.vM12.primary_assembly.annotation GTF, where I ran STAR in a mode that also generates a bam file of the reads mapping to the transcriptome.

For my purpose I'd like to retain only reads that map to transcripts annotated as protein_coding in the GTF, which would be my total, meaning TPMs will be calculated based on that slice of the pie rather than based on all reads.

What I did is samtools sort and index the transcriptomic bam, and then subset that bam with a bed file which only includes the transcripts that are annotated as protein_coding. This reduces the number of mapped reads from 11,653,865 to 3,483,962.

When I use Salmon to quantify expression of that subsetted bam, Salmon crashes (so does MMSEQ), but it doesn't if I give it the un-subsetted bam.

Does anyone have any idea why it's crashing?

salmon samtools GTF gencode • 1.6k views
ADD COMMENT
0
Entering edit mode

What are the error messages when Salmon and MMSEQ crash?

ADD REPLY
0
Entering edit mode

Think it was a bam header issue. Seems to work now.

ADD REPLY
1
Entering edit mode

It would be helpful for everybody if you describe how the problem arose and how you solved it.

ADD REPLY
0
Entering edit mode

I had an error in how I edited the bam's error which produced this problem so I don't think it's worth posting.

ADD REPLY

Login before adding your answer.

Traffic: 1980 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6