Number of non-ATCG nucleotides replaced by Salmon
Entering edit mode
19 days ago
Tonya S. ▴ 10

Should I be concerned about the magnitude of the number of non-ATCG nucleotides recorded to STDOUT while Salmon indexes my transcriptome? The line (sans timestamp) is here:

[puff::index::jointLog] [info] Replaced 8,836,877 non-ATCG nucleotides

I will be doing differential expression and differential alternative splicing analyses after exposure of various Brassica species to heat or cold temperature. I have nonredundant Stringtie2-reconstructed transcripts from which I have extracted spliced exons using gffread. I've used the generateDecoyTranscriptome script from SalmonTools and used the gentrome and decoy file for indexing. So far, so good.

I was puzzled by the line quoted above and searched for reports of issues with Salmon that included the non-numeric keywords, but the number in my output seems considerably higher than in indexing outputs that have been posted for other issues. The genomic fasta that I used to extract the spliced exonic sequence was soft-masked - is that the reason for the high number? I did not see any non-[acgtACGT] nucleotides. Should I be concerned? Indexing ran to completion and I do not see any other obviously disturbing values or messages.

Thanks for any feedback!

rna-seq stringtie salmon • 310 views
Entering edit mode
19 days ago
Rob 6.7k

If there are no other signs that anything is awry, I probably wouldn't worry about this. Is it possible that these non-canonical nucleotides are coming from the decoy sequence? The softmasked bases are converted to uppercase prior to indexing, and prior to checking if they are ATCG (see the relevant code here), so it isn't the soft-masked bases.

Entering edit mode

Oops, yes, that must be where they are coming from. For some reason, I was thinking the genome was just soft-masked. How embarrassing! Thanks for the quick reply.


Login before adding your answer.

Traffic: 2010 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6