Can you choose minimum transcript coverage in salmon
1
0
Entering edit mode
4.7 years ago
yryan ▴ 10

Hi folks,

I'm trying to use salmon to count viral transcripts in some clinical samples I have. However when I use salmon to quantify these viruses it's mapping single poly A or poly T regions of transcripts to similar size poly A and T regions in the viral genomes but only these, and registering these as a count. Is there any way to increase the size of the mapping required before it is considered a true count?

I'd like to be able to use a cut-off that say 50% of a viral genome should be present and mapped from the transcripts before it is a count, rather than a very small poly nucleotide region that is likely an artifiact rather than a true count of that virus.

Thanks!

salmon RNA-Seq rna-seq • 1.4k views
ADD COMMENT
0
Entering edit mode

Can you give a more "solid" example on such a "bad" mapping? It is at least for me difficult to understand what you mean. Also please share the command line.

ADD REPLY
0
Entering edit mode

So if you have a look at this image here

https://ibb.co/vqNJFHQ

Salmon wrongly is counting this transcript (viral genome) due to the presence of just these poly A reads. There is no other reads which align or map to any other section of the transcript but since this is just a poly A section I think I'd be justified in saying it is counted in error. I'd ideally want a way to set a minimum coverage so that I'd need say 20-50% minimum of a transcript to have some coverage before its accepted as a read.

My script is:

for fn in SRR{1..50};
do
echo "Processing sample ${fn}"
salmon quant -i ../indexes/index -l A \
-1 split/${fn}/${fn}_1.fastq.gz \
-2 split/${fn}/${fn}_2.fastq.gz \
--validateMappings \
--writeMappings=./sams/${fn}_viral  \
 -o quants/${fn}_quant
done

edit (couldnt get hyperlink working)

ADD REPLY
0
Entering edit mode

Edited the link. You have to paste the full link including the suffix into field popping up when clicking the image button.

Ok I see what you mean. Did you check how the mate reads align in this case? It is paired-end sequencing so the mate would need to align somewhere near that problematic region, and it would need to be a valid alignment to be even considered by salmon from what I understand. I will tag the developer Rob.

ADD REPLY
0
Entering edit mode

Ahh okay, thanks!

So changing the view type to read type in tablet and all are classed as "Mate unmapped". However the vast majority of these are all the same direction (arrow going to the right, I'm assuming this is read 1?), I'm not sure if that makes any difference to this.

![https://ibb.co/ZdP1tcK][1]

ADD REPLY
0
Entering edit mode

I don't know of a way to do what you ask in salmon (doesn't mean it doesn't exist of course). But a different approach might be to mask homopolymers of a certain length from the viral genome because aligning to it.

ADD REPLY
0
Entering edit mode

Wouldn't it be "safer" to trim trailing polyA sequences of a certain length from the reads directly? That way one would probably still get alignments where the polyA is flanked by non polyA reads and the true origin of the read is not a polyA-tail (given read length is sufficient which is seems to be here).

ADD REPLY
0
Entering edit mode

Looking at the reads above, it doesn't look to me like these reads definately come from polyA tails. Specially, there are reads in that pile up above where there are non-A bases flanking the homopolymer run on both size, where non of the non-A reads match, but the read is still aligned.

ADD REPLY
0
Entering edit mode
4.7 years ago

I don't think any transcript quantification tool (RSEM/Kallisto/Salmon/Cufflinks/StringTie) repports the fraction of transcripts with coverage. The closes you get is StringTie which repports the average coverage - but that can still be driven by such events as you see. To do this you would need to do a standard genome mapping using e.g. STAR and then calculate the gene coverage afterwards with tools such as RNA-SeQC or Rseqc

ADD COMMENT

Login before adding your answer.

Traffic: 1534 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6