Question: Different coverage on RNA-Seq strands
1
gravatar for rodri
11 months ago by
rodri10
rodri10 wrote:

Hi, We have tried several samples from an experiment in GEO (Series GSE74411), especifically SRA accessions SRR2833398, SRR2833399 and SRR2833400; and in all of them we have a larger number of reads alignments to forward (>90%) than to reverse (<10%) strands. The experiment says it uses a paired-end library and a stranded construction protocol, according to GEO.

We made a pretty straightforward analysis:

hisat2-build Spombe.fasta SPindex 
hisat2 -p 8 -x SPindex --sra-acc SRR2833398 -S SRR2833398.sam

samtools sort -@ 8 -o SRR2833398.bam SRR2833398.sam
samtools index SRR2833398.bam

bamCoverage -b SRR2833398.bam -o SRR2833398_f.bw -p 8 -bs 1 --normalizeUsingRPKM --filterRNAstrand forward
bamCoverage -b SRR2833398.bam -o SRR2833398_r.bw -p 8 -bs 1 --normalizeUsingRPKM --filterRNAstrand reverse

The result is therefore a wig with 10x coverage levels on forward than reverse strands. We can of course normalize, but is it normal to have much larger coverages in the forward strand? If so, why? We think it should be more or less the same according to the technique. If not normal, any ideas what can we be doing wrong?

Cheers, Rodrigo

stranded rna-seq paired-end • 619 views
ADD COMMENTlink modified 6 months ago by Mike30 • written 11 months ago by rodri10
1

I agree that sounds strange.

Could it be the coverage tool does not take the strandedness of the paired reads into account and simply assign the reads it to the first strand it processes? Does the %forward correlate with the %of unparied mapped reads?

ADD REPLYlink written 11 months ago by kristoffer.vittingseerup470

You mean bamCoverage? The read pairs seem ok, more or less the same number of forward and reverse reads:

$ samtools view -c -F 16 -@ max SRR2833398.bam
3853154
$ samtools view -c -f 16 -@ max SRR2833398.bam
3561999

Which make good pairs if i represent them all on SRR2833398.bam on IGV.

These are the numbers of bamCoverage:

$ bamCoverage -b SRR2833398.bam -o SRR2833398_rFirst.bw -p 8 -bs 1 --normalizeUsingRPKM --filterRNAstrand reverse
Due to filtering, 4.81172630326% of the aforementioned alignments will be used 341101.112361
$ bamCoverage -b SRR2833398.bam -o SRR2833398_fSecond.bw -p 8 -bs 1 --normalizeUsingRPKM --filterRNAstrand forward
Due to filtering, 95.1882736967% of the aforementioned alignments will be used 6747853.88764

Bin size (-bs) makes no difference, I tried with 10, 20 and 100. bamCoverage is agnostic to the order into which you run for forward or reverse strands.

And these are the numbers of the previous alignment with hisat2:

$ hisat2 -p 16 -x ../SPindex --sra-acc SRR2833398 -S SRR2833398-NEW2.sam
2496425 reads; of these:
  2496425 (100.00%) were paired; of these:
   270836 (10.85%) aligned concordantly 0 times
   1859575 (74.49%) aligned concordantly exactly 1 time
   366014 (14.66%) aligned concordantly >1 times
----
270836 pairs aligned concordantly 0 times; of these:
  7017 (2.59%) aligned discordantly 1 time
----
263819 pairs aligned 0 times concordantly or discordantly; of these:
  527638 mates make up the pairs; of these:
    326208 (61.82%) aligned 0 times
    154476 (29.28%) aligned exactly 1 time
    46954 (8.90%) aligned >1 times
93.47% overall alignment rate

I cannot notice any correspondence of precentages.

Thanks for the help!

ADD REPLYlink modified 11 months ago • written 11 months ago by rodri10
1

Did you check the alignement in IGV ?

ADD REPLYlink written 11 months ago by Nicolas Rosewick6.6k

Yes, we checked in IGV and it just confirms the output of bamCoverage, >10x scale in forward respect to reverse strand

ADD REPLYlink written 11 months ago by rodri10
0
gravatar for kristoffer.vittingseerup
11 months ago by
European Union
kristoffer.vittingseerup470 wrote:

So it is the bamCoverage tool - which can be seen by the filtering it perfroms (filtering to 4% and 95% of reads). Are you using an old version of bamCoverage? According to this its only >= 2.2 where the --filterRNAstrand works for paired data

ADD COMMENTlink written 11 months ago by kristoffer.vittingseerup470

Unfortunately that is not the problem:

$ bamCoverage --version
bamCoverage 2.5.3

I've tried with another S pombe RNA-seq dataset from our lab and it also makes this assymmetry in coverage. I'm trying now with another pombe experiment (GSE97747), where the authors tell they do the coverage with bedtools genomecov, so I can see if there's a difference.

Thanks!

ADD REPLYlink modified 11 months ago • written 11 months ago by rodri10

If you load the raw bam-files (not the bigwig coverage tracks) into IGV do you then see the strand bias?

ADD REPLYlink written 11 months ago by kristoffer.vittingseerup470
0
gravatar for Mike
6 months ago by
Mike30
Canada
Mike30 wrote:

I think I'm having the same problem. I'm using bamCoverage (version 3.0.0) on my paired-end stranded RNA-seq using the --filterRNAstrand option and for each file due to filtering ~90% are being used for the forward strand and ~5% for the reverse, which I don't think is correct. Were you able to figure this out? I'm currently troubleshooting.

ADD COMMENTlink written 6 months ago by Mike30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1518 users visited in the last hour