Question: Can you choose minimum transcript coverage in salmon
0
gravatar for yryan
5 weeks ago by
yryan0
yryan0 wrote:

Hi folks,

I'm trying to use salmon to count viral transcripts in some clinical samples I have. However when I use salmon to quantify these viruses it's mapping single poly A or poly T regions of transcripts to similar size poly A and T regions in the viral genomes but only these, and registering these as a count. Is there any way to increase the size of the mapping required before it is considered a true count?

I'd like to be able to use a cut-off that say 50% of a viral genome should be present and mapped from the transcripts before it is a count, rather than a very small poly nucleotide region that is likely an artifiact rather than a true count of that virus.

Thanks!

rna-seq salmon • 195 views
ADD COMMENTlink modified 4 weeks ago by kristoffer.vittingseerup2.3k • written 5 weeks ago by yryan0

Can you give a more "solid" example on such a "bad" mapping? It is at least for me difficult to understand what you mean. Also please share the command line.

ADD REPLYlink written 5 weeks ago by ATpoint23k

So if you have a look at this image here

https://ibb.co/vqNJFHQ

Salmon wrongly is counting this transcript (viral genome) due to the presence of just these poly A reads. There is no other reads which align or map to any other section of the transcript but since this is just a poly A section I think I'd be justified in saying it is counted in error. I'd ideally want a way to set a minimum coverage so that I'd need say 20-50% minimum of a transcript to have some coverage before its accepted as a read.

My script is:

for fn in SRR{1..50};
do
echo "Processing sample ${fn}"
salmon quant -i ../indexes/index -l A \
-1 split/${fn}/${fn}_1.fastq.gz \
-2 split/${fn}/${fn}_2.fastq.gz \
--validateMappings \
--writeMappings=./sams/${fn}_viral  \
 -o quants/${fn}_quant
done

edit (couldnt get hyperlink working)

ADD REPLYlink modified 5 weeks ago by ATpoint23k • written 5 weeks ago by yryan0

Edited the link. You have to paste the full link including the suffix into field popping up when clicking the image button.

Ok I see what you mean. Did you check how the mate reads align in this case? It is paired-end sequencing so the mate would need to align somewhere near that problematic region, and it would need to be a valid alignment to be even considered by salmon from what I understand. I will tag the developer Rob.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by ATpoint23k

Ahh okay, thanks!

So changing the view type to read type in tablet and all are classed as "Mate unmapped". However the vast majority of these are all the same direction (arrow going to the right, I'm assuming this is read 1?), I'm not sure if that makes any difference to this.

![https://ibb.co/ZdP1tcK][1]

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by yryan0

I don't know of a way to do what you ask in salmon (doesn't mean it doesn't exist of course). But a different approach might be to mask homopolymers of a certain length from the viral genome because aligning to it.

ADD REPLYlink written 5 weeks ago by i.sudbery5.3k

Wouldn't it be "safer" to trim trailing polyA sequences of a certain length from the reads directly? That way one would probably still get alignments where the polyA is flanked by non polyA reads and the true origin of the read is not a polyA-tail (given read length is sufficient which is seems to be here).

ADD REPLYlink written 5 weeks ago by ATpoint23k

Looking at the reads above, it doesn't look to me like these reads definately come from polyA tails. Specially, there are reads in that pile up above where there are non-A bases flanking the homopolymer run on both size, where non of the non-A reads match, but the read is still aligned.

ADD REPLYlink written 5 weeks ago by i.sudbery5.3k
0
gravatar for kristoffer.vittingseerup
4 weeks ago by
European Union
kristoffer.vittingseerup2.3k wrote:

I don't think any transcript quantification tool (RSEM/Kallisto/Salmon/Cufflinks/StringTie) repports the fraction of transcripts with coverage. The closes you get is StringTie which repports the average coverage - but that can still be driven by such events as you see. To do this you would need to do a standard genome mapping using e.g. STAR and then calculate the gene coverage afterwards with tools such as RNA-SeQC or Rseqc

ADD COMMENTlink written 4 weeks ago by kristoffer.vittingseerup2.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2063 users visited in the last hour