Question: Different mapping results with and without GFT file
0
gravatar for wangyang703092
4.8 years ago by
China
wangyang70309290 wrote:

I am trying to use tophat and cufflinks to dealing with RNA-seq data, while something went wrong when running tophat. Different mapping results were generated after running Tophat with and without the GTF, ie, same read mapped to different site in two bam files.

a lot of reads have this problem, and the mapping results with GTF have many mismatches, maybe ~80 mismatch bases in one 90bp read

one is (with gtf)

FCC3378ACXX:4:1113:11999:94438# 99      scaffold_1      928636  50      90M     =       928763  217     CGCGCGTCCACGAGGCCGTTGTGGTCGGCGTCCGCGCGCTCGAAGGCGCCGGCGGCCACGGC       CGCGTTGGGGTTCGGGTTGGCCAGGGGC      bbbceecegggggiihiifihfhehffhhhiggeebccaaaXaccccccccT_a[aacca_accccVaccac_Xaaaa`aaccb^^a]aa      AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i       :0  NM:i:0  MD:Z:90 YT:Z:UU XS:A:-  NH:i:1
  5083 FCC3378ACXX:4:1113:11999:94438# 147     scaffold_1      928763  50      90M     =       928636  -217    GCAGCGGCTCGGGCGGGGACTGCGCCAGCGCCAGCGCGACGAGCAGCGAGAGCGAGAGCGCC       GCTAGCTGCAGGAAGAGGCGCGGCATGG      BBBBBBBBBBaa[R`_b^^cc]V^_V[___a_YXaaa[W^b__Z^a`baaac`bfedaeUdf`agdfdfeahe_gffeggge`JcaZ[__      AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i       :0  NM:i:0  MD:Z:90 YT:Z:UU XS:A:-  NH:i:1

another one is (without gtf)

FCC3378ACXX:4:1113:11999:94438# 99      scaffold_1      910475  50      90M     =       910602  217     CGCGCGTCCACGAGGCCGTTGTGGTCGGCGTCCGCGCGCTCGAAGGCGCCGGCGGCCACGGC       CGCGTTGGGGTTCGGGTTGGCCAGGGGC      bbbceecegggggiihiifihfhehffhhhiggeebccaaaXaccccccccT_a[aacca_accccVaccac_Xaaaa`aaccb^^a]aa      AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i       :0  NM:i:0  MD:Z:90 YT:Z:UU NH:i:1
  4818 FCC3378ACXX:4:1113:11999:94438# 147     scaffold_1      910602  50      90M     =       910475  -217    GCAGCGGCTCGGGCGGGGACTGCGCCAGCGCCAGCGCGACGAGCAGCGAGAGCGAGAGCGCC       GCTAGCTGCAGGAAGAGGCGCGGCATGG      BBBBBBBBBBaa[R`_b^^cc]V^_V[___a_YXaaa[W^b__Z^a`baaac`bfedaeUdf`agdfdfeahe_gffeggge`JcaZ[__      AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i       :0  NM:i:0  MD:Z:90 YT:Z:UU NH:i:1

The command line is tophat -G FM_Physo1_1.gtf --mate-inner-dist 20 --mate-std-dev 34 -p 11 -o ./tophat_out/R0h Psojae_genome R0h_1.fq R0h_2.fq 

Could someone help me with that

rna-seq topaht bowtie2 • 1.6k views
ADD COMMENTlink modified 4.8 years ago • written 4.8 years ago by wangyang70309290
1
gravatar for wangyang703092
4.8 years ago by
China
wangyang70309290 wrote:

I think i may get the answer why this happen,i didn't notice that the conference fasta file was in dos format, when i change it into unix format and build the index, the tophat got the correct result 

ADD COMMENTlink written 4.8 years ago by wangyang70309290

FYI, I've moved this to an answer, since it sounds like this solved the problem.

That's pretty wild, but it's certainly good to know!

ADD REPLYlink written 4.8 years ago by Devon Ryan90k
0
gravatar for roy.granit
4.8 years ago by
roy.granit800
Israel/LabWorm
roy.granit800 wrote:

The inclusion of the GTF file changes the way TopHat does the mapping, it will first uses the GTF file and try to align the reads and only then if some reads do not map to the transcriptome tried to find matches in the genome. 

From the TopHat guide:

"reads that do not fully map to the transcriptome will then be mapped on the genome. The reads that did map on the transcriptome will be converted to genomic mappings (spliced as needed) and merged with the novel mappings and junctions in the final tophat output."

So it could be that if the mapping is done first with the whole genome you might get different hits.

ADD COMMENTlink written 4.8 years ago by roy.granit800
1

That was my first though too, but note the CIGAR strings and MAPQ in each case. Neither case show signs of splicing, they both have the same insert size, and they both have MAPQ=50, which is tophat2-speak for "not a multimapper". The alignment to the genome should have a lower MAPQ due to being a multimapper. This shouldn't happen.

ADD REPLYlink written 4.8 years ago by Devon Ryan90k

i have not described clearly,the read with GTF may mapped to the wrong place because many mismatches occurred, ~80 mismatch bases in one 90bp read, and this is not a coincidence, lots of reads have this problem.

ADD REPLYlink written 4.8 years ago by wangyang70309290

The alignments don't give indication of any mismatches (anyway, bowtie/bowtie2 can't align reads with that many mismatches). Are you sure that's correct?

ADD REPLYlink written 4.8 years ago by Devon Ryan90k

yes, the sam output said there is no mismatch , but when i check the mapped sequence in genome,i found that they are totally different while the mapped sequence without gtf is match the read.

ADD REPLYlink written 4.8 years ago by wangyang70309290

What species is this?

ADD REPLYlink written 4.8 years ago by Devon Ryan90k

phytophthora sojae ,belong to oomycete

ADD REPLYlink written 4.8 years ago by wangyang70309290

Indeed, the first alignment shouldn't ever occur. Perhaps this is some obscure tophat bug.

ADD REPLYlink written 4.8 years ago by Devon Ryan90k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 674 users visited in the last hour