Question: Tophat is mapping paired end reads to the opposite strand
1
gravatar for shawn.foley01
3.5 years ago by
United States
shawn.foley010 wrote:

Hello!

I'm attempting to map strand-specific paired-end reads to the Arabidopsis genome (TAIR10) using TopHat v2.0.12. When I upload these data to JBrowse there are reads mapping to both the plus strand and minus strand, so it looks like I'm losing strand specific information.

I've tried solving the problem, and it looks like the issue occurs during the actual tophat mapping step. The command that I use is:

tophat --library-type fr-secondstrand --read-mismatches 2 --read-edit-dist 2 --max-multihits 10 --b2-very-sensitive --transcriptome-max-hits 10 --no-coverage-search --output-dir topOut_wholeRoot_polyA_rep1_trimmed -G TAIR10_GFF3_genes.gtf -p 2 bowtie2_TAIR10/TAIR10 wholeRoot_polyA_rep1_trimmed_1.fastq wholeRoot_polyA_rep1_trimmed_2.fastq

I ran bamToBed to take the output bam file and convert it to a bed file so I could see what was going on, and it looks like the read pairs are mapping to opposite strands. For example the bam files reads:

HWI-D00550:272:C75M0ANXX:6:1101:10732:9653      99      Chr1    34896   50      125M    =       34898   127     AGCGCATACAGGAAGAAGTCCATGAGAAGCCCACCAAGCAGTTGCAGCAGCTACTGTAGCAGCGGCAATGGCAGTTATACTTGGAGGAGAAGAGCTCATTGGGGTTGATGAATCACCAGAATTCC   BCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGEEEGGGGFGGGGGGGGGGGGGGGGGCGGGGGGGGCFEGCGGGGGGGGGGGEGGEGDGGGGGGGGCAGFCGGGGGGGGGGGGGGGG   AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:125        YT:Z:UU XS:A:-  NH:i:1

HWI-D00550:272:C75M0ANXX:6:1101:10732:9653      147     Chr1    34898   50      125M    =       34896   -127    CGCATACAGGAAGAAGTCCATGAGAAGCCCACCAAGCAGTTGCAGCAGCTACTGTAGCAGCGGCAATGGCAGTTATACTTGGAGGAGAAGAGCTCATTGGGGTTGATGAATCACCAGAATTCCCG   BGGGGGEGGFGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGBCCCB   AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:125        YT:Z:UU XS:A:-  NH:i:1

While the bamToBed output reads:

Chr1 34895 35020 HWI-D00550:272:C75M0ANXX:6:1101:10732:9653/1 50 +

Chr1    34897   35022   HWI-D00550:272:C75M0ANXX:6:1101:10732:9653/2    50      -

My understanding is that for all of my downstream analyses these paired reads need to be on the same strand. Is there a problem with my TopHat command? Does anyone have any recommendations to correct this post-mapping?

 

Thank you for the help! 

 

 

ADD COMMENTlink modified 3.5 years ago by h.mon23k • written 3.5 years ago by shawn.foley010
1
gravatar for h.mon
3.5 years ago by
h.mon23k
Brazil
h.mon23k wrote:

Illumina stranded PE sequencing results in one of the mates mapping on the + strand, while the other is reverse complementary and will map on the - strand. Which one (first or second mate) will map to the + strand depends on the library prep protocol. 

Probably there is nothing wrong with your mapping, check if the pairs are being flagged as "properly paired" with samtools flagstat.

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by h.mon23k

Thank you for the advice. The pairs ARE "properly paired". How can I got about downstream analyses and uploading this to JBrowse if each fragment maps to both the + and - strands? Is there something I can do after mapping to have paired reads map to the same strand?

Thank you!

ADD REPLYlink written 3.5 years ago by shawn.foley010

I know nothing about JBrowse so can't help here. For all downstream analyses I can think of, what matters after mapping PE reads is if they are flagged as properly paired or not, and if their mapping quality is good.

ADD REPLYlink written 3.5 years ago by h.mon23k

But if I perform strand specific analyses with HTSeq or even bedtools operations then half of the reads will map to the wrong strand, so the read counts will be off for any genes that overlap on opposite strands. How do you account for that during analyses? Thank you for the help, these are my first paired end data sets I'm analyzing. 

ADD REPLYlink written 3.5 years ago by shawn.foley010
1

You are incorrect about htseq-count. From its manual:

For stranded=no, a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. For stranded=yes and single-end reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. For stranded=reverse, these rules are reversed.

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by h.mon23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 766 users visited in the last hour