STAR mapper for arabidopsis Thaliana
0
0
Entering edit mode
17 months ago

I am trying to run STAR aligner for arabidopsis thaliana. I am using the commands as mentioned below. I am doing this to compare my results with that obtained from TOPHAT mapper. I am using default parameters in tophat. The mapping percentage is much higher in STAR compared to TOPHAT. Is there any parameter I need to set specifically to match with TOPHAT? Also is there anything specific I need to set for Arabidopsis Thaliana?

% of uniquely matches is higher in STAR(34%) compared to TOPHAT(26%). Overall mapping - STAR (97%), TOPHAT(76%).

Reference files have been taken from ENSEMBL website

STAR genome build command:

STAR --runThreadN 16 --runMode genomeGenerate --genomeDir /home5/STARgenome --genomeSAindexNbases 10 --genomeFastaFiles /home5/Arabidopsis_thaliana.TAIR10.dna.toplevel.fa --sjdbGTFfile /home5/Arabidopsis_thaliana.TAIR10.46.gtf --sjdbOverhang 100


STAR --genomeDir home5/STARgenome --runThreadN 16 --readFilesIn <(gunzip -c fastq.gz) --genomeLoad LoadAndKeep --outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0 --outFilterMismatchNmax 2 --limitBAMsortRAM 50000000000 --outFileNamePrefix home5/star/sample1 --outSAMtype BAM SortedByCoordinate --outSAMunmapped Within --outSAMattributes Standard

RNA-Seq alignment • 678 views
1
Entering edit mode

Tophat is super-old and relies internally on bowtie1 which, back in the day of its publication (2010), was optimised to handle short reads and ungapped alignments. Back then the standard read lengths were somewhat 36 or 50bp in many cases, often even single-ended. Things have changed over the years. STAR was developed with the 2x75bp RNA-seq data of the ENCODE consortium in mind. I personally think you should not even do this comparison simply because the tools are so different. STAR is a more modern aligner for RNA-seq and actually among the tools that you should use these days. It is probably not even possible to tweak Tophat in a way to perform similarily. Be sure to check the STAR paper (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3530905/) as it also does a lot of comparisons to other tools.

0
Entering edit mode

That makes sense. I am now comparing STAR with HISAT2. HISAT2 which is a newer tool also gives me lower mapping compared to STAR. Does the same explanation hold true here as well? Since Tophat and Hisat2 run similar algorithms?

Thanks!