Question: Selecting best RNA-seq mapping with TopHat
0
gravatar for mjoyraj
4.3 years ago by
mjoyraj50
Taiwan
mjoyraj50 wrote:

After mapping my PE RNA-seq with Tophat with three different options (given below). I found the following mapping summary given below in table.

1) Without reference annotation:

tophat -p 8 -r 50 -o "output" "indexed_genome_file" R1.fastq R2.fastq

2) With reference annotation:

tophat -p 8 -G “genes.gtf” -o “tophat_RABM” “Genome” R1.fastq R2.fastq

3) With reference annotation disabling novel junctions:

Tophat --no-novel-juncs -p 8 -G “genes.gtf” -o “tophat_RABM” “Genome” R1.fastq R2.fastq

     

Mapping reads to genome with TopHat

     

With reference annotation

With reference annotation disabling novel junctions

Without reference annotation

Left reads

         
 

Input

 

71926313

71926313

71926313

 

Mapped

 

62199375 (86.5% of input)

60663645 (84.3% of input)

61835864 (86.0% of input)

   

Multiple alignment

10352306 (16.6%)(477254 have >20)

11540865 (19.0%)(508565 have >20)

15034571 (24.3%) (665149 have >20)

           

Right reads

         
 

Input

 

71926313

71926313

71926313

 

Mapped

 

62071170 (86.3% of input)

60575371 (84.2% of input)

61694450 (85.8% of input)

   

Multiple alignment

10352990 (16.7%)(477253 have >20

11553883 (19.1%)(508573 have >20)

15030545 (24.4%) (665010 have >20)

           

Overall mapping rate

   

86.40%

84.30%

85.90%

           
 

Aligned pairs

 

57244789

55041529

56591033

   

Multiple alignment

9527350 (16.6%)

10609333 (19.3%)

13776265 (24.3%)

   

Discordant alignment

4048217 ( 7.1%)

4044795 ( 7.3%)

3755391 ( 6.6%)

   

Concordant alignment

74.00%

70.90%

73.50%

           
           

No. of junctions

   

144075

97906

140296

 

Accordingly, I thought “with reference annotation” is the best one. But when I viewed the BAM file with junctions, I found there is lot of junctions with high depth between very distantly located genes. My genes of interest are duplicate genes. I guess pre-filtering the mapping along with some other arguments will further improve the mapping, so I thought of running the mapping with the following options:

tophat -p 8 -G genes.gtf -o SRX528281_tophat_RABM_Prefilter --no-mixed --no-discordant --max-multihits 2 --prefilter-multihits --read-realign-edit-dist 0 Genome R1.fastq R2.fastq

Whether my approach is correct…?? Whether the options included will improve the mapping without excluding important information’s..??  Any suggestion will be highly appreciated….

rna-seq alignment • 2.7k views
ADD COMMENTlink modified 4.2 years ago by Biostar ♦♦ 20 • written 4.3 years ago by mjoyraj50

I think think you are just making it complicated. If you have a GTF file, just use it. If you are not interested in novel transcripts, disable it.

Anyway these quantitative changes exists even if you run the tools with same set of options multiple times.

ADD REPLYlink written 4.3 years ago by geek_y9.3k

Some of the duplicate genes have high similarity or highly similar sequence pattern. So I think, if I do not filter the multiple hits, some false positive novel junctions will be revealed with significance. Though I have a GTF file, the genes on which I am interested are mostly predicted. So from the RNA seq results I am trying to re-annotate the genes and also looking for if the genes have isoforms...

ADD REPLYlink written 4.3 years ago by mjoyraj50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1407 users visited in the last hour