Question: Improving alignment rates for RNA-seq
0
gravatar for pm2012
3.4 years ago by
pm201280
United States
pm201280 wrote:

Hi All, I am analyzing a RNA-seq data set and the alignment results I have been getting are really baffling me. I have tried exhaustive list of multiple conditions/parameters but none seem to improve my alignment rates significantly. Here are some details for my sample data:

-Data was obtained from total RNA obtained from tumor samples using Nugen Ovation Single Cell RNA-seq kit. We received ~80 million x 2 100bp paired end reads. - I obtained about 40-50% ballpark alignment rate using tophat2 using different parameters.

FastQC suggests high duplication rates. The quality seems ok (no red flags except dropping of quality to ~20 at 3' end of reads). I have used Tophat2 for all my alignments using default settings. I have tried the following conditions.

-Trimming of 8bp from forward reads (as suggested by Nugen library prep kit), trimming of low quality bases (quality>20) at the ends, using different trimming/clipping tools like fastx, fastq-mcf from ea-utils, trim-galore discarding reads below length < 20 bases after adapter/quality trimming.

I have also tried using different -library types for tophat and also changed -r option to reflect my fragment size. I suspect that my RNA-prep could possibly have a significant rRNA fraction and maybe removing the reads mapping to these could possibly improve alignment.

I would appreciate if you you could provide any suggestions for improving my alignment. Tophat is my preferred alignment as I have been using it for years now on other datasets and performs fairly robustly. However, I would be open to switching to other aligners if needed.

Thanks a lot for your help.

rna-seq alignment tophat • 3.1k views
ADD COMMENTlink modified 3.4 years ago by michael.ante3.5k • written 3.4 years ago by pm201280
1

Make a short set of rRNA sequences in fasta format, and filter your data with programs that allow you to discard these reads from your files. One possibility is using BBSplit. You have more information in this thread

ADD REPLYlink written 3.4 years ago by Antonio R. Franco4.2k

By the way, this is an straight alternative to that discussed by Michael Ante. BBSplit will use BBMap to map the reads to the reference

ADD REPLYlink written 3.4 years ago by Antonio R. Franco4.2k
3
gravatar for michael.ante
3.4 years ago by
michael.ante3.5k
Austria/Vienna
michael.ante3.5k wrote:

Hi,

If you have a significant rRNA content, use Bowtie2 to align directly to the rRNA sequences and use the unmapped reads (--un I think) for further Tophat mapping. rRNA originated reads tend to map to all rRNA clusters and their repeats. Tophat usually filters multi-mapping reads.

Additionally, you can check the mean insert size. In case of overlapping reads, Tophat may sometimes run into problem.

What is the alignment rate of just using R1-reads?

Cheers,

Michael

ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by michael.ante3.5k

Thank you for your suggestions. I will proceed with alignment after removing rRNA.

I had checked the mean insert size by running bowtie2. I did have overlapping reads. How would one normally deal with these reads? Haven't done the alignment using just R1 yet. Will do it now.

ADD REPLYlink written 3.4 years ago by pm201280
1

You can give Tophat2 also negative values for the mean inner distance parameter. It might be good also providing the standard deviation of the distance. The reads are treated normally.

ADD REPLYlink written 3.4 years ago by michael.ante3.5k

Hi Michael, I have 70% of reads mapping to rDNA. Is that unusual?

ADD REPLYlink written 3.4 years ago by RT340

Does your library prep use ribodepletion or polyA enrichment?

ADD REPLYlink written 3.4 years ago by WouterDeCoster41k

@RT: Not unusual if your samples are not ribo-depleted or the depletion did not work well.

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by genomax73k

This can be answered after mapping with the rRNA sequences..

ADD REPLYlink written 3.4 years ago by Antonio R. Franco4.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1864 users visited in the last hour