Question

Improving alignment rates for RNA-seq

0

Entering edit mode

8.0 years ago

pm2012 ▴ 140

Hi All, I am analyzing a RNA-seq data set and the alignment results I have been getting are really baffling me. I have tried exhaustive list of multiple conditions/parameters but none seem to improve my alignment rates significantly. Here are some details for my sample data:

-Data was obtained from total RNA obtained from tumor samples using Nugen Ovation Single Cell RNA-seq kit. We received ~80 million x 2 100bp paired end reads. - I obtained about 40-50% ballpark alignment rate using tophat2 using different parameters.

FastQC suggests high duplication rates. The quality seems ok (no red flags except dropping of quality to ~20 at 3' end of reads). I have used Tophat2 for all my alignments using default settings. I have tried the following conditions.

-Trimming of 8bp from forward reads (as suggested by Nugen library prep kit), trimming of low quality bases (quality>20) at the ends, using different trimming/clipping tools like fastx, fastq-mcf from ea-utils, trim-galore discarding reads below length < 20 bases after adapter/quality trimming.

I have also tried using different -library types for tophat and also changed -r option to reflect my fragment size. I suspect that my RNA-prep could possibly have a significant rRNA fraction and maybe removing the reads mapping to these could possibly improve alignment.

I would appreciate if you you could provide any suggestions for improving my alignment. Tophat is my preferred alignment as I have been using it for years now on other datasets and performs fairly robustly. However, I would be open to switching to other aligners if needed.

Thanks a lot for your help.

RNA-Seq Tophat alignment • 5.2k views

ADD COMMENT • link updated 8.0 years ago by michael.ante ★ 3.8k • written 8.0 years ago by pm2012 ▴ 140

1

Entering edit mode

Make a short set of rRNA sequences in fasta format, and filter your data with programs that allow you to discard these reads from your files. One possibility is using BBSplit. You have more information in this thread

ADD REPLY • link 8.0 years ago by Antonio R. Franco ★ 5.1k

0

Entering edit mode

By the way, this is an straight alternative to that discussed by Michael Ante. BBSplit will use BBMap to map the reads to the reference

ADD REPLY • link 8.0 years ago by Antonio R. Franco ★ 5.1k

score 3 · Answer 1 · 2016-05-12

3

Entering edit mode

8.0 years ago

michael.ante ★ 3.8k

Hi,

If you have a significant rRNA content, use Bowtie2 to align directly to the rRNA sequences and use the unmapped reads (--un I think) for further Tophat mapping. rRNA originated reads tend to map to all rRNA clusters and their repeats. Tophat usually filters multi-mapping reads.

Additionally, you can check the mean insert size. In case of overlapping reads, Tophat may sometimes run into problem.

What is the alignment rate of just using R1-reads?

Cheers,

Michael

ADD COMMENT • link 8.0 years ago by michael.ante ★ 3.8k

0

Entering edit mode

Thank you for your suggestions. I will proceed with alignment after removing rRNA.

I had checked the mean insert size by running bowtie2. I did have overlapping reads. How would one normally deal with these reads? Haven't done the alignment using just R1 yet. Will do it now.

ADD REPLY • link 8.0 years ago by pm2012 ▴ 140

1

Entering edit mode

You can give Tophat2 also negative values for the mean inner distance parameter. It might be good also providing the standard deviation of the distance. The reads are treated normally.

ADD REPLY • link 8.0 years ago by michael.ante ★ 3.8k

0

Entering edit mode

Hi Michael, I have 70% of reads mapping to rDNA. Is that unusual?

ADD REPLY • link 8.0 years ago by GR ▴ 400

0

Entering edit mode

Does your library prep use ribodepletion or polyA enrichment?

ADD REPLY • link 8.0 years ago by WouterDeCoster 47k

0

Entering edit mode

@RT: Not unusual if your samples are not ribo-depleted or the depletion did not work well.