Question: Tophat : Sam-Flag 115 = Properly-Paired + Read.Reverse + Mate.Reverse ?
2
gravatar for Pierre Lindenbaum
4.7 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum111k wrote:

EDIT: Cross posted on SE: http://seqanswers.com/forums/showthread.php?p=122835

I run tophat2 using the standard options.

$ tophat2 -v                                                                    
TopHat v2.0.10

$ samtools view -H TOPHAT/accepted_hits.bam   | grep PG
@PG     ID:TopHat       VN:2.0.10       CL:/commun/data/packages/tophat-2.0.10.Linux_x86_64/tophat -p 10 -G genes.gtf -o  TOPHAT --rg-id g24 --rg-library 6VGWT3 --rg-sample 6VGWT3 --rg-description 6VGWT3  34 fastqs --rg-platform-unit 1 2 3 4 --rg-center Nantes --rg-platform Illumina  mm10 6VGWT3_ATGTCA_L002_R1_002.fastq.gz,6VGWT3_ATGTCA_L004_R1_002.fastq.gz,6VGWT3_ATGTCA_L003_R1_002.fastq.gz,6VGWT3_ATGTCA_L002_R1_003.fastq.gz,6VGWT3_ATGTCA_L004_R1_003.fastq.gz,6VGWT3_ATGTCA_L003_R1_003.fastq.gz,6VGWT3_ATGTCA_L002_R1_004.fastq.gz,6VGWT3_ATGTCA_L004_R1_004.fastq.gz,6VGWT3_ATGTCA_L003_R1_004.fastq.gz,6VGWT3_ATGTCA_L004_R1_001.fastq.gz,6VGWT3_ATGTCA_L002_R1_001.fastq.gz,6VGWT3_ATGTCA_L003_R1_001.fastq.gz,6VGWT3_ATGTCA_L001_R1_002.fastq.gz,6VGWT3_ATGTCA_L001_R1_003.fastq.gz,6VGWT3_ATGTCA_L001_R1_004.fastq.gz,6VGWT3_ATGTCA_L001_R1_001.fastq.gz 6VGWT3_ATGTCA_L002_R2_002.fastq.gz,6VGWT3_ATGTCA_L004_R2_002.fastq.gz,6VGWT3_ATGTCA_L003_R2_002.fastq.gz,6VGWT3_ATGTCA_L002_R2_003.fastq.gz,6VGWT3_ATGTCA_L004_R2_003.fastq.gz,6VGWT3_ATGTCA_L003_R2_003.fastq.gz,6VGWT3_ATGTCA_L002_R2_004.fastq.gz,6VGWT3_ATGTCA_L004_R2_004.fastq.gz,6VGWT3_ATGTCA_L003_R2_004.fastq.gz,6VGWT3_ATGTCA_L004_R2_001.fastq.gz,6VGWT3_ATGTCA_L002_R2_001.fastq.gz,6VGWT3_ATGTCA_L003_R2_001.fastq.gz,6VGWT3_ATGTCA_L001_R2_002.fastq.gz,6VGWT3_ATGTCA_L001_R2_003.fastq.gz,6VGWT3_ATGTCA_L001_R2_004.fastq.gz,6VGWT3_ATGTCA_L001_R2_001.fastq.gz

I found some sam flags 115 !

115=

  • read paired
  • read mapped in proper pair
  • read reverse strand
  • mate reverse strand
  • first in pair

.

$ samtools view  -f 115 -F 256  dir/accepted_hits.bam | head -n 2
HWI-1KL149:61:D2C11ACXX:4:2204:6848:94129       115     chr1    24611547        1       70M2D31M        chrM    10906   0       GTAGGCGATTAGTGATTTTAAATCTGTTTGGCGTAAGCAGATTGAGCTAGTTATAATTATTCCTCATAGGGAGAAGGATGAAGGGGTATGCTATATATTTT      DDDDDBDDDDDEEEEEEFFFFFFFHHHHJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIJJJJIJJJJJJJJJJJJJJHHHHHFFFFFCCC      AS:i:-11        XN:i:0  XM:i:0  XO:i:1  XG:i:2  NM:i:2  MD:Z:70^GA31    YT:Z:UU NH:i:4  CC:Z:chrM CP:i:10928       HI:i:2  RG:Z:g24
HWI-1KL149:61:D2C11ACXX:2:2109:12004:4228       115     chr1    24611549        1       101M    chrM    10815   0       TGGCGATTAGTGATTTTAAATCTGTTTGGCGTAAGCAGATTGAGCTAGTTATAATTATTCCTCATAGGGAGAGAAGGATGAAGGGGTATGCTATATATTTT      DDDDDDDDDDDEEEEEEEFFFFFHHHJJJJJJJJJJJJJJJJJJIJJJJIJJJJJJJJIJJJJIGJJJJJJJJJJIJJJJJJJJJJJJHHHHHFFFFFCCC      AS:i:-5 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:0A100      YT:Z:UU NH:i:4  CC:Z:chrM       CP:i:10928HI:i:2   RG:Z:g24

how can a read be "mapped in proper pair" and read reverse strand+ mate reverse strand ? what is the consequence for a tool like htseqcount ? Does it only count the reads in proper pair ?

bowtie2 tophat2 sam • 2.7k views
ADD COMMENTlink modified 4.6 years ago by Biostar ♦♦ 20 • written 4.7 years ago by Pierre Lindenbaum111k
1

Note the HI:i:2 auxiliary flag. When I still used tophat2, I never had it output more than a single hit. I presume that this is the source of the issue. I suspect that if you grep for those reads, you'll find that the mate is for HI:i:1, which would make sense when looking at the RNEXT, PNEXT and TLEN columns and the CC and CP auxiliary fields. This is my only guess as to why this might be the case (I'd post this as an answer if I actually knew!).

Regarding htseq-count, just use the MAPQ threshold and set it above 4 to get rid of multimappers from tophat.

ADD REPLYlink written 4.7 years ago by Devon Ryan82k

you have strange RNEXT, PNEXT and TLEN columns as well. why is the chromosome name there? shouldn't be another read name or =

ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by Istvan Albert ♦♦ 77k

yes. That's strange but that's really the raw output for tophat2.

ADD REPLYlink written 4.7 years ago by Pierre Lindenbaum111k

no idea here ? I'll ask SE...

ADD REPLYlink written 4.7 years ago by Pierre Lindenbaum111k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1138 users visited in the last hour