Question: Running salmon after subsetting the bam to transcripts annotated as protein coding in the GTF
gravatar for rubic
12 months ago by
United States
rubic180 wrote:


I have mouse RNA-seq data (single-end stranded - reverse strand) which I STAR mapped against mm10 with gencode.vM12.primary_assembly.annotation GTF, where I ran STAR in a mode that also generates a bam file of the reads mapping to the transcriptome.

For my purpose I'd like to retain only reads that map to transcripts annotated as protein_coding in the GTF, which would be my total, meaning TPMs will be calculated based on that slice of the pie rather than based on all reads.

What I did is samtools sort and index the transcriptomic bam, and then subset that bam with a bed file which only includes the transcripts that are annotated as protein_coding. This reduces the number of mapped reads from 11,653,865 to 3,483,962.

When I use Salmon to quantify expression of that subsetted bam, Salmon crashes (so does MMSEQ), but it doesn't if I give it the un-subsetted bam.

Does anyone have any idea why it's crashing?

gencode samtools salmon gtf • 465 views
ADD COMMENTlink modified 12 months ago • written 12 months ago by rubic180

What are the error messages when Salmon and MMSEQ crash?

ADD REPLYlink written 12 months ago by h.mon24k

Think it was a bam header issue. Seems to work now.

ADD REPLYlink written 12 months ago by rubic180

It would be helpful for everybody if you describe how the problem arose and how you solved it.

ADD REPLYlink written 12 months ago by h.mon24k

I had an error in how I edited the bam's error which produced this problem so I don't think it's worth posting.

ADD REPLYlink written 12 months ago by rubic180
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1101 users visited in the last hour