Question: (Closed) Questions about Salmon
0
gravatar for kyusikkim
9 days ago by
kyusikkim10
kyusikkim10 wrote:

Hello All! I had some questions involving alignment based v alignment free mapping done by Salmon as well as some other general questions pertaining specifically to our experiments. Please excuse any perceived ignorance as I'm more of a molecular than computation biologist and statistics is not my strong suit!

  1. If one will probably map the reads in order to visualize downstream via tools like IGV, is it better to use the alignment based mapping so that Salmon agrees with the alignment output?
  2. Our experiments involve looking at ribosome profiling data when translation is disrupted - resulting in low read counts. We also have the corresponding RNA-Seq data. Should one use gcbias and seqbias for one/both data sets? My idea was to not use either option to avoid introducing bias into our already limited ribosome profiling data set.
  3. Could someone provide a better explanation for fldMean and fldSD? We are using Illumina HiSeq 50 bp single end reads. Would changing the default settings be more useful? I had a hard time understanding the documentation / comments in Source Code
  4. While I believe I generally understand the kmer parameter in alignment free mapping, I don't quite understand how it works for reads with lengths < kmer length. I ask because our ribosomal fragments range from 12-36 nt but I get maximum mapping percentage at k=17. Are the smaller fragments simply excluded? (I forgot to check if # unmapped = total reads with lengths < 17, I will check and edit).

Thank you for the understanding and help!

rna-seq salmon • 144 views
ADD COMMENTlink modified 9 days ago by Rob2.1k • written 9 days ago by kyusikkim10

Hello kyusikkim!

We believe that this post does not fit the main topic of this site.

Solved

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLYlink written 7 days ago by kyusikkim10
1
gravatar for Rob
9 days ago by
Rob2.1k
United States
Rob2.1k wrote:

Hi,

Some thoughts related to your questions:

If one will probably map the reads in order to visualize downstream via tools like IGV, is it better to use the alignment based mapping so that Salmon agrees with the alignment output?

Alignment-based Salmon deals with mappings of the reads against the transcriptome, not the genome. As such, there aren't many tools to visualize alignments directly against the transcriptome. Moreover, a visualization of the multi-mapping alignments would represent how they pile up before the multi-mapping is resolved by Salmon. Generally, there aren't really good tools to do this type of visualization (though we are working on some tools / features along these lines). That said, if you map with STAR, it may be possible to have STAR write out the alignments in both the genomic & transcriptomic coordinates --- the former file could be visualized with e.g. IGV while the latter could be provided to Salmon as input. If you're planning to do alignment with STAR anyway, it's reasonable to provide the transcriptomic alignments to Salmon as input.

Our experiments involve looking at ribosome profiling data when translation is disrupted - resulting in low read counts. We also have the corresponding RNA-Seq data. Should one use gcbias and seqbias for one/both data sets? My idea was to not use either option to avoid introducing bias into our already limited ribosome profiling data set.

I'm not quite sure I understand the use-case here. Salmon is designed for processing RNA-seq data, not ribosome profiling data. While the multi-mapping problem is present in both types of data, there are fundamentally different modeling assumptions at play in ribosome profiling, as read generation is affected both by the underlying transcript abundance and by the dynamics / speed of the translation process. As far as I am aware, there isn't a method or tool currently available that fully models ribosome profiling at the transcript level.

Could someone provide a better explanation for fldMean and fldSD? We are using Illumina HiSeq 50 bp single end reads. Would changing the default settings be more useful? I had a hard time understanding the documentation / comments in Source Code

With paired-end data, the empirical distribution of fragment lengths can be inferred directly from the data. However, this is not possible with single-end data. Thus, for single-end data, you need to provide estimates of what the fragment length distribution in your experiment looked like. These can be inferred from e.g. the bioanalyzer scans that accompanied the sequencing experiment. For single-end data, the fldMean and fldSD parameters tell Salmon what assumed fragment length distribution should be used, this distribution, in turn, will affect how Salmon computes the effective lengths of these transcripts (which, in turn, can affect the estimated abundances).

While I believe I generally understand the kmer parameter in alignment free mapping, I don't quite understand how it works for reads with lengths < kmer length. I ask because our ribosomal fragments range from 12-36 nt but I get maximum mapping percentage at k=17. Are the smaller fragments simply excluded? (I forgot to check if # unmapped = total reads with lengths < 17, I will check and edit).

Reads of length < k will not be mapped. Salmon will actually print to it's log (by default both to stderr and to the log file) the fraction of reads that were observed that were shorter than k. Mapping for these reads will not even be attempted. However, please see above regarding the caveats regarding the processing of ribosomal profiling data.

ADD COMMENTlink written 9 days ago by Rob2.1k
1

Thank you for the quick response! Reading your response definitely helped reframe some of the information floating around in my head. After doing some additional searching and thinking following the information your provided, I believe I have answered all of my questions. Again, thank you so much.

I'm following the Ribomap protocol for mapping; I will save the quasi-mapping for our accompanying RNA-seq data.

Wang, H., McManus, J., & Kingsford, C. (2016). Isoform-level ribosome occupancy estimation guided by transcript abundance with Ribomap. Bioinformatics, 32(12), 1880–1882. http://doi.org/10.1093/bioinformatics/btw085

ADD REPLYlink modified 9 days ago • written 9 days ago by kyusikkim10
Please log in to add an answer.
The thread is closed. No new answers may be added.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1311 users visited in the last hour