Library type warning in Salmon
1
0
Entering edit mode
3.5 years ago

Hi Biostars,

I have run Salmon software for RNAseq data analysis and got the warning message

Greater than 5% of the fragments disagreed with the provided library type; check the file: Pool9_21565_GTTTCG_quant/lib_format_counts.json for details


lib_format_counts.json file looks like "read_files": "/home/ghovhannisyan/users/tg/hhovhannisyan/col_cancer/raw_data

/Pool3_21559_ACTGAT.fastq.gz",
"expected_format": "SR",
"compatible_fragment_ratio": 0.8350394020362054,
"num_compatible_fragments": 7447997,
"num_assigned_fragments": 8919336,
"num_consistent_mappings": 12428090,
"num_inconsistent_mappings": 2455593,
"MSF": 0,
"OSF": 0,
"ISF": 0,
"MSR": 0,
"OSR": 0,
"ISR": 0,
"SF": 2455593,
"SR": 12428090,
"MU": 0,
"OU": 0,
"IU": 0,
"U": 0


I know for sure that library preparation protocol was TruSeq stranded (SR option in Salmon). I am curious what can be the interpretation of the result I got? Does it point out to some levels of antisense transcription?

Thanks

Salmon RNAseq library-type • 1.3k views
1
Entering edit mode

What organism? I have the impression for badly annotated genomes (which means any genome except for human and mouse?) there are some / a lot of unannotated overlapping features which end up being ascribed to the wrong feature. Now, if you are working with human or mouse, them I am even more clueless.

2
Entering edit mode
3.5 years ago
Rob 4.9k

Hi Grant,

Salmon's warnings are very conservative, in that it prefers to issue a warning to let you know that something _might_ be awry rather than letting potential issues go unnoticed. Specifically, with respect to the library strandedness, Salmon will issue a warning if >5% of the observed mappings are not in the prescribed orientation. It looks like in this case, your data is most consistent with the SR protocol (accounting for ~84% of the mappings), but there are ~15% of mappings other than SR. It's normal, actually, that not all reads follow the expected strandedness. However, 15% is a bit high (but not necessarily disastrous). As h.mon suggests, one possibility is unannotated features that are on the reverse strand of annotated ones. It's also worth mentioning that, by default (in the current releases), Salmon will give incompatible fragments a very low _a priori_ probability, but will assign them if there is no other explanation for the read. You could consider also quantifying the samples with --incompatPrior 0, which will disallow incompatible mappings, and see what affect that has on downstream analysis.

0
Entering edit mode

Hi Rob, Thanks for swift reply and explanation. The data I am working with is a bit tricky. We used a specific enrichment technology to enrich lncRNAs from human samples, and then basically sequenced only those enriched lncRNAs and got our fastq files. Then I have run Salmon on transcriptome which I got as an output from rsem-prepare-reference (maybe you know by chance whether rsem considers strand information when preparing reference?) with top level assembly of human genome (with haplotypes and patches) and gtf file of lncRNAs (so basically it is a "lncRNA transcriptome"). What is generally known that some lncRNAs are located on opposite strand of protein coding genes, and if for some reason the enrichment technology has also picked up corresponding protein coding genes, than your and h.mon's theory fits well to what I see:) One question, is it possible to extract the reads that conflict with lybrary type? If yes than maybe I can assemble them and see where they come from. Thank you

0
Entering edit mode

Sorry, and one more question. I got this warning, [warning] Only 4289407 fragments were mapped, but the number of burn-in fragments was set to 5000000. Mapping rate is 64 %. Should I just ignore that warning?