Question

RNA-seq read alignment best practices for low-input Smart-seq2 samples

0

Entering edit mode

6.3 years ago

goldberg.jm ▴ 90

RNA-seq read alignment best practices for low-input Smart-seq2 samples

Hi All,

I think my data may fall in a nether region between single-cell and bulk...

There are many ways to align RNA-seq reads to transcripts and genomes, and I am confused about which is best for my experiment.

My goal is to do differential expression analysis on Plasmodium vivax data. The input is fastq files of paired 25nt reads from Smart-seq2.

The P.vivax genome is 29 Mb, 40% G/C, 6,642 genes. Introns are much smaller and fewer in number than human introns. P.vivax RNA is hard to get, so we used Smart-seq2 to generate sequence. Using bowtie and RSEM we find for our richer samples we get 4.2 million reads aligned to transcripts, and 86% transcripts with at least one read aligned. For our less rich samples we get 1.2 million reads aligned to transcripts, and 78% transcripts with at least one read aligned. rRNA accounts for < 10% of the aligned reads.

Questions:

Is my understanding correct that if I am primarily interested in differential expression of known genes, as opposed to finding new genes and fixing gene models, that I may align reads to transcripts as opposed to the whole genome?
There are so many different aligners: bwa, bowtie, bowtie2, STAR, HISAT2. How much does it really matter which one I use? Does using a different alignment program have major effects on the outcome, or is it more just playing around the edges?
How do I tell which is best for my data? What metrics ought I use to assess my data?

I have read articles reviewing best practices for evaluation RNA-seq data, for example PMID26813401. But I’m not sure how this translates to my data, which I think falls in a nether region between single-cell and bulk.

It is wonderfully convenient to run RSEM and have RSEM do the alignment itself. Is it better practice to do the alignment step outside of RSEM?

Thank you! Jon

RNA-Seq alignment • 2.5k views

ADD COMMENT • link 6.3 years ago by goldberg.jm ▴ 90

1

Entering edit mode

The input is fastq files of paired 25nt reads from Smart-seq2.

With such short reads you are going to have a difficult time aligning these reads uniquely.

I may align reads to transcripts as opposed to the whole genome?

You could, but then you may be forcing some reads to align in places where they may not have originated from.

How much does it really matter which one I use? Does using a different alignment program have major effects on the outcome, or is it more just playing around the edges?

As long as you use an aligner that is splice aware you should be fine. Choice of aligner should not make a big difference (but if you really have 25 bp reads then you will have to see).

ADD REPLY • link 6.3 years ago by GenoMax 141k

0

Entering edit mode

Thank you genomax! With respect to the problem of reads aligning to places from which they have not originated, would the shortness of the reads (25nt) not be mitigated by the fact that the reads are paired?

Also, could you please suggest a metric I may use to assess whether reads might be aligning to the wrong places?

ADD REPLY • link 6.3 years ago by goldberg.jm ▴ 90