Question: Optimization of RNA-Seq data mapping with tophat2
0
gravatar for Denis
3.4 years ago by
Denis200
Denis200 wrote:

Hi there,

I'm trying to map Illumina PE RNA-Seq data originated from one fish genus to the genome of another related genus of the same family with tophat2 software. I started with almost default settings of tophat2 (excepting -r flag). Then i've added --mate-std-dev 4000 --read-edit-dist 20 to my command line, but mapping statistics is still bad.

Left reads:
Input : 14811423
Mapped : 7078897 (47.8% of input)
of these: 4273616 (60.4%) have multiple alignments (1521255 have >20)
Right reads:
Input : 14811423
Mapped : 6753768 (45.6% of input)
of these: 4051398 (60.0%) have multiple alignments (1521195 have >20)
46.7% overall read mapping rate.

Aligned pairs: 5026382
of these: 3371900 (67.1%) have multiple alignments
1614729 (32.1%) are discordant alignments
23.0% concordant pair alignment rate.

My questions are: Which tophat settings i have to try in my case? Which program for RNA-Seq reads mapping would be better to test besides tophat2?

rna-seq alignment • 1.6k views
ADD COMMENTlink modified 3.4 years ago by Satyajeet Khare1.6k • written 3.4 years ago by Denis200
2

use STAR aligner

ADD REPLYlink written 3.4 years ago by geek_y11k
1

You may have rRNA in this data if you did not use a ribo-depletion method or it did not work well. You can take a sample of reads that don't align and blast them to see what you find. If you do have rRNA contamination then using a different aligner will not help.

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by GenoMax94k

Thanks for your reply. I've consulted with our wet lab specialists, we extracted and sequenced only poly-A RNA fraction. But anyway it's a good idea! I have to take a look at not aligned reads. Also it's interesting to check reads with multiple alignments with the reference genome.

ADD REPLYlink written 3.4 years ago by Denis200
1

Last time i had scores like this, i tried trimming according to fastqc results and it improved the numbers a lot.

ADD REPLYlink written 3.4 years ago by firatuyulur300

Did you mean quality and adapter trimming of the data before mapping step?

ADD REPLYlink written 3.4 years ago by Denis200

If not quality at least adapter trimming for sure before mapping.

ADD REPLYlink written 3.4 years ago by GenoMax94k
1
gravatar for Satyajeet Khare
3.4 years ago by
Satyajeet Khare1.6k
Pune, India
Satyajeet Khare1.6k wrote:

You can use HiSAT2 or STAR as recommended by @geek_y, but since you are aligning reads from one fish genus RNA-Seq to another fish genus genome, low alignment might be real. Other issues that may lead to lower alignment are, incorrect de-multiplexing and contamination during sample preparation.

ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by Satyajeet Khare1.6k

Thanks. Definitely make sense. How can i check my data for de-multiplexing correctness ? Contamination i could probably check by blast search of unmapped reads.

ADD REPLYlink written 3.4 years ago by Denis200

In my experience, samples with rRNA contamination does not affect alignment percentage. rRNA contamination does affect differential expression analysis though. I generally check rRNA contamination by looking at the FASTQ report or reads at rRNA genes. Duplication graph on FASTQ will climb with 90%> duplication levels. Demultiplexig issues will lead to cross contamination. You can BLAST reads manaully. That might help.

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by Satyajeet Khare1.6k
2

samples with rRNA contamination does not affect alignment percentage

Perhaps you include rDNA repeat in your reference? Not many do.

ADD REPLYlink written 3.4 years ago by GenoMax94k

Okay, that is a good piece of information. I was not aware that rDNA regions are typically removed by most of the others.

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by Satyajeet Khare1.6k

Would you recommend to do repeat masking (including rDNA repeats ) before read mapping in RNA-Seq experiments? I thought, that's relevant only for DNA-Seq reads mapping.

ADD REPLYlink written 3.4 years ago by Denis200
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 936 users visited in the last hour