Question: Running salmon after subsetting the bam to transcripts annotated as protein coding in the GTF
gravatar for rubic
2.2 years ago by
United States
rubic190 wrote:


I have mouse RNA-seq data (single-end stranded - reverse strand) which I STAR mapped against mm10 with gencode.vM12.primary_assembly.annotation GTF, where I ran STAR in a mode that also generates a bam file of the reads mapping to the transcriptome.

For my purpose I'd like to retain only reads that map to transcripts annotated as protein_coding in the GTF, which would be my total, meaning TPMs will be calculated based on that slice of the pie rather than based on all reads.

What I did is samtools sort and index the transcriptomic bam, and then subset that bam with a bed file which only includes the transcripts that are annotated as protein_coding. This reduces the number of mapped reads from 11,653,865 to 3,483,962.

When I use Salmon to quantify expression of that subsetted bam, Salmon crashes (so does MMSEQ), but it doesn't if I give it the un-subsetted bam.

Does anyone have any idea why it's crashing?

gencode samtools salmon gtf • 786 views
ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by rubic190

What are the error messages when Salmon and MMSEQ crash?

ADD REPLYlink written 2.2 years ago by h.mon30k

Think it was a bam header issue. Seems to work now.

ADD REPLYlink written 2.2 years ago by rubic190

It would be helpful for everybody if you describe how the problem arose and how you solved it.

ADD REPLYlink written 2.2 years ago by h.mon30k

I had an error in how I edited the bam's error which produced this problem so I don't think it's worth posting.

ADD REPLYlink written 2.2 years ago by rubic190
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1407 users visited in the last hour