Running salmon after subsetting the bam to transcripts annotated as protein coding in the GTF

0

Entering edit mode

7.3 years ago

rubic ▴ 270

Hi,

I have mouse RNA-seq data (single-end stranded - reverse strand) which I STAR mapped against mm10 with gencode.vM12.primary_assembly.annotation GTF, where I ran STAR in a mode that also generates a bam file of the reads mapping to the transcriptome.

For my purpose I'd like to retain only reads that map to transcripts annotated as protein_coding in the GTF, which would be my total, meaning TPMs will be calculated based on that slice of the pie rather than based on all reads.

What I did is samtools sort and index the transcriptomic bam, and then subset that bam with a bed file which only includes the transcripts that are annotated as protein_coding. This reduces the number of mapped reads from 11,653,865 to 3,483,962.

When I use Salmon to quantify expression of that subsetted bam, Salmon crashes (so does MMSEQ), but it doesn't if I give it the un-subsetted bam.

Does anyone have any idea why it's crashing?

salmon samtools GTF gencode • 2.0k views

ADD COMMENT • link 7.3 years ago by rubic ▴ 270

0

Entering edit mode

What are the error messages when Salmon and MMSEQ crash?

ADD REPLY • link 7.3 years ago by h.mon 35k

0

Entering edit mode

Think it was a bam header issue. Seems to work now.

ADD REPLY • link 7.3 years ago by rubic ▴ 270

1

Entering edit mode

It would be helpful for everybody if you describe how the problem arose and how you solved it.

ADD REPLY • link 7.3 years ago by h.mon 35k

0

Entering edit mode

I had an error in how I edited the bam's error which produced this problem so I don't think it's worth posting.

ADD REPLY • link 7.3 years ago by rubic ▴ 270

Login before adding your answer.