Question

Abundance/frequency of RNASeq raw reads mapped to a transcript exon

0

Entering edit mode

5.6 years ago

mlopez ▴ 10

We have paired-end Illumina RNASeq reads and we are working with a non-model organism with no reference genome. We have a working composite for a protein sequence that includes every exon we have found via cDNA. We have 6 muscle types with some triplicates and want to see how many times 4 specific exons that look to be alternatively spliced are present in each muscle type.

For example, muscle type a has this exon expressed 46% while muscle type b only expresses this exon 12% of the time.

I'm not looking for differential expression, only a number of how many times this exon is found within the muscle type's transcript file.

I've tired feeding HISAT2 BAM files into stringtie and also taking the GTF files from stringtie and putting them into htseq-count but neither worked.

I was already able to align the raw reads to the composite and visualize the alignment in IGV. However, there are thousands of raw reads aligning to the 4 exons of internet. So I was hoping that there would be a better way of quantifying the frequency than manually counting.

Do I have to annotate the composite so that it is easier to select what I am looking for and if so how do I do that.

RNA-Seq • 1.5k views

ADD COMMENT • link updated 5.6 years ago by Devon Ryan 104k • written 5.6 years ago by mlopez ▴ 10

1

Entering edit mode

I apologize for focusing on something other than the question, but did you post the same question (with slightly different wording) under different accounts?

Get an abundance/frequency of how many times within an RNASeq file a transcript maps to an exon

Both mention "For example, muscle type a has this exon expressed 46% while muscle type b only expresses this exon 12% of the time."

I very much want to encourage use of Biostars, but I think it is kind of important to have a transparent account, ideally linked to other information about yourself (such as your actual name, photo, etc.). Otherwise, it is harder to keep track of the answers in the different posts, and I think seeing the overall learning process for a project is important for the broader community.

ADD REPLY • link 5.6 years ago by Charles Warden 8.2k

1

Entering edit mode

There is another person in the lab working with the same samples and that was her account. She is focusing more on the bioinformatic aspect and so I asked her to post the question originally. When I found out the sign up was free, then I posted the question. I apologize for any confusion I might have caused.

ADD REPLY • link 5.6 years ago by mlopez ▴ 10

0

Entering edit mode

That's OK - there are frequently similarly worded questions coming from different users. However, they usually aren't this close to being identical, and usually are posted on different days :)

ADD REPLY • link 5.6 years ago by Charles Warden 8.2k

0

Entering edit mode

Do you just have the exon sequences or do you have approximate transcript isoform sequences? The latter will be easier to use going forward.

ADD REPLY • link 5.6 years ago by Devon Ryan 104k

0

Entering edit mode

I have exon sequences yes.

ADD REPLY • link 5.6 years ago by mlopez ▴ 10

score 1 · Answer 1 · 2018-09-25

While the comments in Get an abundance/frequency of how many times within an RNASeq file a transcript maps to an exon are quite good (for those that can't see them, basically, "Align to the exons and count reads"), I have a feeling the following will give you better estimates:

Assemble the transcriptome as best you can (e.g., using Trinity).
Determine which transcripts contain your exons of interest. Most likely there will just be one each.
Use salmon or kallisto with your reads and the results from 1.
Extract the values associated with the transcripts in 2.

You might have to add a couple transcripts together. The nice thing about this as opposed to the more straight-forward "align to the exons and count" is that this will better handle cases where the exons have different GC content or are differentially affected by 3' bias.