Question

"Spikey" RNA-Seq coverage

0

Entering edit mode

7.0 years ago

Walter Baumann ▴ 10

Hi,

I tried to process a RNA-Seq experiment myself, but when I upload the file in IGV and look at the coverage track, I see not an "equal distribution" of reads over the whole transcript (so roughly a plateau), but rather mountains .

My questions more precise: Maybe I cut the wrong adapters off? However, I firstly remove rRNA sequences with bowtie, this works. Or is there a parameter I have to change in my STAR command?

I downloaded the raw data from Geo: https://www.ncbi.nlm.nih.gov/sra/?term=SRA160745 More precisely here: https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR1630838

(If I am correct, the FASTQC report shows that the read length is between 25 and 40 nts, with apr. 50 million sequences.)

Briefly to how I processed the data:

1) I trimmed the adapters using cutadapt. I obtained the adapter sequences from the paper Gao et al. (2015) "Quantitative profiling of initiating ribosomes in vivo": "Deep sequencing. Single-stranded template was amplified by PCR by using the Phusion high-fidelity (HF) enzyme (NEB) according to the manufacturer’s instructions. The oligonucleotide primers qNTI200 (5 -CAAGCAGAAGACGGCATA- 3 ) and qNTI201 (5 -AATGATACGGCGACCACCG ACAGGTTCAGAGTTCTAC AGTCCGACG- 3 ) were used to create DNA suitable for sequencing, i.e., DNA with Illumina cluster generation sequences on each end and a sequencing primer binding site."

2) I used STAR for the alignment: STAR --genomeDir file/to/STAR_Index_Genome_100/ --alignEndsType EndToEnd --readFilesIn file.fasta --runThreadN 2 --outFilterMismatchNmax 3 --outFilterMultimapNmax 20 --chimScoreSeparation 10 --chimScoreMin 20 --chimSegmentMin 15 --outSAMattributes All --outFilterIntronMotifs RemoveNoncanonicalUnannotated --alignSJoverhangMin 500 --outFileNamePrefix /path/to/star_output/ --outReadsUnmapped Fastx --outSAMtype BAM SortedByCoordinate --quantMode GeneCounts --outWigType bedGraph read1_5p --outWigNorm RPM --outSAMstrandField intronMotif --outSAMmultNmax 1 --outMultimapperOrder Random

RNA-Seq alignment STAR • 2.0k views

ADD COMMENT • link updated 7.0 years ago by Matteo Schiavinato ★ 3.6k • written 7.0 years ago by Walter Baumann ▴ 10

score 1 · Answer 1 · 2017-05-02

1

Entering edit mode

7.0 years ago

Matteo Schiavinato ★ 3.6k

I see not an "equal distribution" of reads over the whole transcript (so roughly a plateau), but rather mountains

Well this is kind of expected in RNASeq coverage because you can have multiple isoforms and therefore having an exon, shared by both isoforms, which has twice the expression of the other exons that are unique of one isoform only.

Maybe I cut the wrong adapters off?

This shouldn't affect that much the result, you would see fewer reads mapping but at the same time it wouldn't be something so visible.

In principle, the "mountains" you mention are all but unexpected: in RNASeq the expression is differential among exons of the same gene if it undergoes alternative splicing, which is quite the case in eukaryotes.

Also, depending on your sequencing depth, you might have differences in coverage among regions of the same exon, because when you sequence you don't have the perfect ideal scenario that represents your data, but a (good) approximation of it.

ADD COMMENT • link 7.0 years ago by Matteo Schiavinato ★ 3.6k

0

Entering edit mode

Thanks for your comment. I double checked with another RNA-Seq experiment (already processed, as bam file), same cell line (HEK293) and the bam file looks differently. However, looking at the Actin B gene I can totally reproduce your answer and the coverage tracks look identical in both experiments. However, there are genes where you can clearly see differences (e.g. ZNF598 and more). I just rechecked and saw that the RNA-Seq experiment from Geo is single-end, while my "comparison dataset" is paired-end. I expect both experiments to be comparable, because I also have ribosome profiling data and here the read counts correlate quite well with each other (spearman cor of 0.9). However, the RNA-Seq experiments correlate not so well (0.25).

ADD REPLY • link 7.0 years ago by Walter Baumann ▴ 10

1

Entering edit mode

Differences in protocols can indeed change the patterns you see. But it can also be the reflect of true biological variations between your samples (some genes are expected to be static while other should display more variations). Moreover, even if same cell lines should be similar they tend to vary from one lab to another (medium of culture, freezing/thawing etc). To reliably compare two RNA-seq you should have samples from the same experiment otherwise too many parameters can interfere.

ADD REPLY • link 7.0 years ago by VHahaut ★ 1.2k

0

Entering edit mode

How did you calculate the spearman correlation? You shouldn't compare the raw counts, because you have two different sequencing technologies and possibly two different sequencing depths.

ADD REPLY • link 7.0 years ago by Matteo Schiavinato ★ 3.6k

0

Entering edit mode

To obtain a quick overview, I counted the reads per region with SummarizeOverlaps (R Bioconductor). Yes, I then correlated the raw counts against each other (for both Ribosome profiling and RNA-Seq). My first assumption was that different cell counts, read depth... just change the scaling. By correlating the ribosome profiling raw counts of both experiments, I obtained a spearman correlation of 0.9. By correlating the RNA-Seq raw counts of both experiments, I obtained a spearman correlation of 0.25. But I agree here that the sequencing technology (paired-,vs. single end) might be the crucial point.

I want to compare the ribosome coverage per transcript, (so Ribosome profiling reads/RNA-Seq reads per gene/transcript).

ADD REPLY • link 7.0 years ago by Walter Baumann ▴ 10