Question

Tophat is mapping paired end reads to the opposite strand

1

Entering edit mode

9.9 years ago

shawn.foley01 • 0

Hello!

I'm attempting to map strand-specific paired-end reads to the Arabidopsis genome (TAIR10) using TopHat v2.0.12. When I upload these data to JBrowse there are reads mapping to both the plus strand and minus strand, so it looks like I'm losing strand specific information.

I've tried solving the problem, and it looks like the issue occurs during the actual tophat mapping step. The command that I use is:

tophat \
  --library-type fr-secondstrand \
  --read-mismatches 2 \
  --read-edit-dist 2 \
  --max-multihits 10 \
  --b2-very-sensitive \
  --transcriptome-max-hits 10 \
  --no-coverage-search \
  --output-dir topOut_wholeRoot_polyA_rep1_trimmed \
  -G TAIR10_GFF3_genes.gtf \
  -p 2 \
  bowtie2_TAIR10/TAIR10 \
  wholeRoot_polyA_rep1_trimmed_1.fastq \
  wholeRoot_polyA_rep1_trimmed_2.fastq

I ran bamToBed to take the output bam file and convert it to a bed file so I could see what was going on, and it looks like the read pairs are mapping to opposite strands. For example the bam files reads:

HWI-D00550:272:C75M0ANXX:6:1101:10732:9653      99      Chr1    34896   50      125M    =       34898   127     AGCGCATACAGGAAGAAGTCCATGAGAAGCCCACCAAGCAGTTGCAGCAGCTACTGTAGCAGCGGCAATGGCAGTTATACTTGGAGGAGAAGAGCTCATTGGGGTTGATGAATCACCAGAATTCC   BCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGEEEGGGGFGGGGGGGGGGGGGGGGGCGGGGGGGGCFEGCGGGGGGGGGGGEGGEGDGGGGGGGGCAGFCGGGGGGGGGGGGGGGG   AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:125        YT:Z:UU XS:A:-  NH:i:1
HWI-D00550:272:C75M0ANXX:6:1101:10732:9653      147     Chr1    34898   50      125M    =       34896   -127    CGCATACAGGAAGAAGTCCATGAGAAGCCCACCAAGCAGTTGCAGCAGCTACTGTAGCAGCGGCAATGGCAGTTATACTTGGAGGAGAAGAGCTCATTGGGGTTGATGAATCACCAGAATTCCCG   BGGGGGEGGFGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGBCCCB   AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:125        YT:Z:UU XS:A:-  NH:i:1

While the bamToBed output reads:

Chr1 34895 35020 HWI-D00550:272:C75M0ANXX:6:1101:10732:9653/1 50 +
Chr1    34897   35022   HWI-D00550:272:C75M0ANXX:6:1101:10732:9653/2    50      -

My understanding is that for all of my downstream analyses these paired reads need to be on the same strand. Is there a problem with my TopHat command? Does anyone have any recommendations to correct this post-mapping?

Thank you for the help!

paired-end-reads RNA-Seq strand-specific Tophat • 4.1k views

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 9.9 years ago by shawn.foley01 • 0

Ram · Answer 1 · 2015-08-22

1

Entering edit mode

9.9 years ago

h.mon 35k

Illumina stranded PE sequencing results in one of the mates mapping on the + strand, while the other is reverse complementary and will map on the - strand. Which one (first or second mate) will map to the + strand depends on the library prep protocol.

Probably there is nothing wrong with your mapping, check if the pairs are being flagged as "properly paired" with samtools flagstat.

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 9.9 years ago by h.mon 35k

0

Entering edit mode

Thank you for the advice. The pairs ARE "properly paired". How can I got about downstream analyses and uploading this to JBrowse if each fragment maps to both the + and - strands? Is there something I can do after mapping to have paired reads map to the same strand?

Thank you!

ADD REPLY • link 9.9 years ago by shawn.foley01 • 0

0

Entering edit mode

I know nothing about JBrowse so can't help here. For all downstream analyses I can think of, what matters after mapping PE reads is if they are flagged as properly paired or not, and if their mapping quality is good.

ADD REPLY • link updated 2.8 years ago by Ram 45k • written 9.9 years ago by h.mon 35k

0

Entering edit mode

But if I perform strand specific analyses with HTSeq or even bedtools operations then half of the reads will map to the wrong strand, so the read counts will be off for any genes that overlap on opposite strands. How do you account for that during analyses? Thank you for the help, these are my first paired end data sets I'm analyzing.

ADD REPLY • link updated 2.8 years ago by Ram 45k • written 9.9 years ago by shawn.foley01 • 0

1

Entering edit mode

You are incorrect about htseq-count. From its manual:

For stranded=no, a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. For stranded=yes and single-end reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. For stranded=reverse, these rules are reversed.

ADD REPLY • link updated 2.8 years ago by Ram 45k • written 9.9 years ago by h.mon 35k