Question: Different coverage on RNA-Seq strands
0
gravatar for rodri
7 weeks ago by
rodri0
rodri0 wrote:

Hi, We have tried several samples from an experiment in GEO (Series GSE74411), especifically SRA accessions SRR2833398, SRR2833399 and SRR2833400; and in all of them we have a larger number of reads alignments to forward (>90%) than to reverse (<10%) strands. The experiment says it uses a paired-end library and a stranded construction protocol, according to GEO.

We made a pretty straightforward analysis:

hisat2-build Spombe.fasta SPindex 
hisat2 -p 8 -x SPindex --sra-acc SRR2833398 -S SRR2833398.sam

samtools sort -@ 8 -o SRR2833398.bam SRR2833398.sam
samtools index SRR2833398.bam

bamCoverage -b SRR2833398.bam -o SRR2833398_f.bw -p 8 -bs 1 --normalizeUsingRPKM --filterRNAstrand forward
bamCoverage -b SRR2833398.bam -o SRR2833398_r.bw -p 8 -bs 1 --normalizeUsingRPKM --filterRNAstrand reverse

The result is therefore a wig with 10x coverage levels on forward than reverse strands. We can of course normalize, but is it normal to have much larger coverages in the forward strand? If so, why? We think it should be more or less the same according to the technique. If not normal, any ideas what can we be doing wrong?

Cheers, Rodrigo

stranded rna-seq paired-end • 238 views
ADD COMMENTlink modified 7 weeks ago by kristoffer.vittingseerup350 • written 7 weeks ago by rodri0
1

I agree that sounds strange.

Could it be the coverage tool does not take the strandedness of the paired reads into account and simply assign the reads it to the first strand it processes? Does the %forward correlate with the %of unparied mapped reads?

ADD REPLYlink written 7 weeks ago by kristoffer.vittingseerup350

You mean bamCoverage? The read pairs seem ok, more or less the same number of forward and reverse reads:

$ samtools view -c -F 16 -@ max SRR2833398.bam
3853154
$ samtools view -c -f 16 -@ max SRR2833398.bam
3561999

Which make good pairs if i represent them all on SRR2833398.bam on IGV.

These are the numbers of bamCoverage:

$ bamCoverage -b SRR2833398.bam -o SRR2833398_rFirst.bw -p 8 -bs 1 --normalizeUsingRPKM --filterRNAstrand reverse
Due to filtering, 4.81172630326% of the aforementioned alignments will be used 341101.112361
$ bamCoverage -b SRR2833398.bam -o SRR2833398_fSecond.bw -p 8 -bs 1 --normalizeUsingRPKM --filterRNAstrand forward
Due to filtering, 95.1882736967% of the aforementioned alignments will be used 6747853.88764

Bin size (-bs) makes no difference, I tried with 10, 20 and 100. bamCoverage is agnostic to the order into which you run for forward or reverse strands.

And these are the numbers of the previous alignment with hisat2:

$ hisat2 -p 16 -x ../SPindex --sra-acc SRR2833398 -S SRR2833398-NEW2.sam
2496425 reads; of these:
  2496425 (100.00%) were paired; of these:
   270836 (10.85%) aligned concordantly 0 times
   1859575 (74.49%) aligned concordantly exactly 1 time
   366014 (14.66%) aligned concordantly >1 times
----
270836 pairs aligned concordantly 0 times; of these:
  7017 (2.59%) aligned discordantly 1 time
----
263819 pairs aligned 0 times concordantly or discordantly; of these:
  527638 mates make up the pairs; of these:
    326208 (61.82%) aligned 0 times
    154476 (29.28%) aligned exactly 1 time
    46954 (8.90%) aligned >1 times
93.47% overall alignment rate

I cannot notice any correspondence of precentages.

Thanks for the help!

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by rodri0
1

Did you check the alignement in IGV ?

ADD REPLYlink written 7 weeks ago by Nicolas Rosewick5.8k

Yes, we checked in IGV and it just confirms the output of bamCoverage, >10x scale in forward respect to reverse strand

ADD REPLYlink written 7 weeks ago by rodri0
0
gravatar for kristoffer.vittingseerup
7 weeks ago by
European Union
kristoffer.vittingseerup350 wrote:

So it is the bamCoverage tool - which can be seen by the filtering it perfroms (filtering to 4% and 95% of reads). Are you using an old version of bamCoverage? According to this its only >= 2.2 where the --filterRNAstrand works for paired data

ADD COMMENTlink written 7 weeks ago by kristoffer.vittingseerup350

Unfortunately that is not the problem:

$ bamCoverage --version
bamCoverage 2.5.3

I've tried with another S pombe RNA-seq dataset from our lab and it also makes this assymmetry in coverage. I'm trying now with another pombe experiment (GSE97747), where the authors tell they do the coverage with bedtools genomecov, so I can see if there's a difference.

Thanks!

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by rodri0

If you load the raw bam-files (not the bigwig coverage tracks) into IGV do you then see the strand bias?

ADD REPLYlink written 6 weeks ago by kristoffer.vittingseerup350
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1424 users visited in the last hour