Question: Mapping RNA-Seq data on genome from another species
1
gravatar for tlorin
2.8 years ago by
tlorin260
Switzerland
tlorin260 wrote:

Hello all,

I have RNA-Seq data from a species A for which I don't have a reference genome. I would like to map this data on the closest available genome from species B.

I have read this post and I am planning to use hisat2 or STAR. Does any of you have recommandations regarding the parameters to set for mapping, given that I don't have a reference genome? I would expect that I should use this program with less "stringency" than if I could map on the species A genome. For instance, the default parameter in STAR is --outFilterMismatchNmax 10: should I set it to 20? 30? In hisat2, it is --score-min L,0,-0.2.

My reads are 100nt.

In addition, could you recommend any lecture related to this question?

Many thanks for your answers!

mapping rna-seq star hisat2 genome • 2.2k views
ADD COMMENTlink modified 2.8 years ago by Brian Bushnell17k • written 2.8 years ago by tlorin260
2

Here you have a very good review about how to do NGS analysis with non-model organisms. It has a part on RNA-seq and how to deal with situations were you do not have a reference genomeNext-generation biology: Sequencing and data analysis approaches for non-model organisms

ADD REPLYlink written 2.8 years ago by IP690
1
gravatar for jnoble333
2.8 years ago by
jnoble33320
jnoble33320 wrote:

Hi tlorin,

I use STAR to do this frequently with the parameter --outFilterMismatchNmax 8. You can check the alignment percentage in the Log.final.out file and adjust your mismatch threshold accordingly. You may face the issue of reads aligning to a lot of different locations and can filer your bam file by the mapq score to address this. Be aware that unaligned reads may be placed in the "% of reads unmapped: too short" category even if they are unaligned because of mismatches. Dobin said something about this in one of the threads for the STAR google group.

ADD COMMENTlink written 2.8 years ago by jnoble33320

Thanks for your answer! But why --outFilterMismatchNmax 8and not --outFilterMismatchNmax 15 or over threshold? Is this some arbitrary threshold that you found was best? I guess this depends on the evolutionary distance between both species too. Is this the post you refer to?

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by tlorin260

That is the post I went by. That threshold works for my dataset for the most part because I have a lot of reads. I'd say try different values and see how your results differ.

ADD REPLYlink written 2.8 years ago by jnoble33320
1
gravatar for Brian Bushnell
2.8 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

BBMap will align RNA-seq data to genomes with a substantially lower identity than most other aligners, particularly other splice-aware aligners. You might add the flags "maxindel=200k minid=0.7" to increase alignment rate in this case.

ADD COMMENTlink written 2.8 years ago by Brian Bushnell17k

Thanks Brian! If I add maxindel=200k minid=0.7 do I risk mapping to spurious locations or will the mapping report the best unique hit (so, the homologous sequence in my case)?

ADD REPLYlink written 2.8 years ago by tlorin260
1

With looser requirements, you can have a greater risk of mapping to spurious locations, but in that case they will have a lower mapq score. BBMap will always map reads to the best-matching location. So, "minid=0.9" and "minid=0.8" and "minid=0.7" will all report the same alignment for a given read, if the best alignment has, say, 95% identity. But if the best alignment only has 85% identity, then "minid=0.9" won't report anything.

ADD REPLYlink written 2.8 years ago by Brian Bushnell17k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1454 users visited in the last hour