Question: Running salmon after subsetting the bam to transcripts annotated as protein coding in the GTF
0
gravatar for rubic
16 months ago by
rubic180
United States
rubic180 wrote:

Hi,

I have mouse RNA-seq data (single-end stranded - reverse strand) which I STAR mapped against mm10 with gencode.vM12.primary_assembly.annotation GTF, where I ran STAR in a mode that also generates a bam file of the reads mapping to the transcriptome.

For my purpose I'd like to retain only reads that map to transcripts annotated as protein_coding in the GTF, which would be my total, meaning TPMs will be calculated based on that slice of the pie rather than based on all reads.

What I did is samtools sort and index the transcriptomic bam, and then subset that bam with a bed file which only includes the transcripts that are annotated as protein_coding. This reduces the number of mapped reads from 11,653,865 to 3,483,962.

When I use Salmon to quantify expression of that subsetted bam, Salmon crashes (so does MMSEQ), but it doesn't if I give it the un-subsetted bam.

Does anyone have any idea why it's crashing?

gencode samtools salmon gtf • 587 views
ADD COMMENTlink modified 16 months ago • written 16 months ago by rubic180

What are the error messages when Salmon and MMSEQ crash?

ADD REPLYlink written 16 months ago by h.mon26k

Think it was a bam header issue. Seems to work now.

ADD REPLYlink written 16 months ago by rubic180
1

It would be helpful for everybody if you describe how the problem arose and how you solved it.

ADD REPLYlink written 16 months ago by h.mon26k

I had an error in how I edited the bam's error which produced this problem so I don't think it's worth posting.

ADD REPLYlink written 16 months ago by rubic180
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1482 users visited in the last hour