Question

Different coverage on RNA-Seq strands

1

Entering edit mode

6.5 years ago

rodri ▴ 10

Hi, We have tried several samples from an experiment in GEO (Series GSE74411), especifically SRA accessions SRR2833398, SRR2833399 and SRR2833400; and in all of them we have a larger number of reads alignments to forward (>90%) than to reverse (<10%) strands. The experiment says it uses a paired-end library and a stranded construction protocol, according to GEO.

We made a pretty straightforward analysis:

hisat2-build Spombe.fasta SPindex 
hisat2 -p 8 -x SPindex --sra-acc SRR2833398 -S SRR2833398.sam

samtools sort -@ 8 -o SRR2833398.bam SRR2833398.sam
samtools index SRR2833398.bam

bamCoverage -b SRR2833398.bam -o SRR2833398_f.bw -p 8 -bs 1 --normalizeUsingRPKM --filterRNAstrand forward
bamCoverage -b SRR2833398.bam -o SRR2833398_r.bw -p 8 -bs 1 --normalizeUsingRPKM --filterRNAstrand reverse

The result is therefore a wig with 10x coverage levels on forward than reverse strands. We can of course normalize, but is it normal to have much larger coverages in the forward strand? If so, why? We think it should be more or less the same according to the technique. If not normal, any ideas what can we be doing wrong?

Cheers, Rodrigo

RNA-Seq stranded paired-end • 3.2k views

ADD COMMENT • link updated 6.1 years ago by Mike ▴ 60 • written 6.5 years ago by rodri ▴ 10

1

Entering edit mode

I agree that sounds strange.

Could it be the coverage tool does not take the strandedness of the paired reads into account and simply assign the reads it to the first strand it processes? Does the %forward correlate with the %of unparied mapped reads?

ADD REPLY • link 6.5 years ago by Kristoffer Vitting-Seerup ★ 4.0k

0

Entering edit mode

You mean bamCoverage? The read pairs seem ok, more or less the same number of forward and reverse reads:

$ samtools view -c -F 16 -@ max SRR2833398.bam
3853154
$ samtools view -c -f 16 -@ max SRR2833398.bam
3561999

Which make good pairs if i represent them all on SRR2833398.bam on IGV.

These are the numbers of bamCoverage:

$ bamCoverage -b SRR2833398.bam -o SRR2833398_rFirst.bw -p 8 -bs 1 --normalizeUsingRPKM --filterRNAstrand reverse
Due to filtering, 4.81172630326% of the aforementioned alignments will be used 341101.112361
$ bamCoverage -b SRR2833398.bam -o SRR2833398_fSecond.bw -p 8 -bs 1 --normalizeUsingRPKM --filterRNAstrand forward
Due to filtering, 95.1882736967% of the aforementioned alignments will be used 6747853.88764

Bin size (-bs) makes no difference, I tried with 10, 20 and 100. bamCoverage is agnostic to the order into which you run for forward or reverse strands.

And these are the numbers of the previous alignment with hisat2:

$ hisat2 -p 16 -x ../SPindex --sra-acc SRR2833398 -S SRR2833398-NEW2.sam
2496425 reads; of these:
  2496425 (100.00%) were paired; of these:
   270836 (10.85%) aligned concordantly 0 times
   1859575 (74.49%) aligned concordantly exactly 1 time
   366014 (14.66%) aligned concordantly >1 times
----
270836 pairs aligned concordantly 0 times; of these:
  7017 (2.59%) aligned discordantly 1 time
----
263819 pairs aligned 0 times concordantly or discordantly; of these:
  527638 mates make up the pairs; of these:
    326208 (61.82%) aligned 0 times
    154476 (29.28%) aligned exactly 1 time
    46954 (8.90%) aligned >1 times
93.47% overall alignment rate

I cannot notice any correspondence of precentages.

Thanks for the help!

ADD REPLY • link 6.5 years ago by rodri ▴ 10

1

Entering edit mode

Did you check the alignement in IGV ?

ADD REPLY • link 6.5 years ago by Nicolas Rosewick 11k

0

Entering edit mode

Yes, we checked in IGV and it just confirms the output of bamCoverage, >10x scale in forward respect to reverse strand

ADD REPLY • link 6.5 years ago by rodri ▴ 10

score 0 · Answer 1 · 2017-10-24

0

Entering edit mode

6.5 years ago

Kristoffer Vitting-Seerup ★ 4.0k

So it is the bamCoverage tool - which can be seen by the filtering it perfroms (filtering to 4% and 95% of reads). Are you using an old version of bamCoverage? According to this its only >= 2.2 where the --filterRNAstrand works for paired data

ADD COMMENT • link 6.5 years ago by Kristoffer Vitting-Seerup ★ 4.0k

0

Entering edit mode

Unfortunately that is not the problem:

$ bamCoverage --version
bamCoverage 2.5.3

I've tried with another S pombe RNA-seq dataset from our lab and it also makes this assymmetry in coverage. I'm trying now with another pombe experiment (GSE97747), where the authors tell they do the coverage with bedtools genomecov, so I can see if there's a difference.

Thanks!

ADD REPLY • link 6.5 years ago by rodri ▴ 10

0

Entering edit mode

If you load the raw bam-files (not the bigwig coverage tracks) into IGV do you then see the strand bias?

ADD REPLY • link 6.5 years ago by Kristoffer Vitting-Seerup ★ 4.0k

score 0 · Answer 2 · 2018-03-08

I think I'm having the same problem. I'm using bamCoverage (version 3.0.0) on my paired-end stranded RNA-seq using the --filterRNAstrand option and for each file due to filtering ~90% are being used for the forward strand and ~5% for the reverse, which I don't think is correct. Were you able to figure this out? I'm currently troubleshooting.