Question

how can I solve multi-mapping issue

0

Entering edit mode

6.9 years ago

ysbaek • 0

I am analyzing my RNA-seq data using Trimmomatric for trimming, Tophat2, bowtie for alignments/mapping, samtools for sort, and htseq for counting. However, after aligning my sequencing on the genome reference using tophat2 and bowtie, my multiple alignment % is 66.2% which is too high to be validated. Would you mind to review my script and give any suggestions? THANK YOU!!

Trimmomatric

java -jar /home/apps/trimmomatic/trimmomatic-0.33/trimmomatic-0.33.jar SE -threads 12 -phred33 -trimlog KC1_trimmomatic_log.txt ~/data/genome_raw_seq/bicolor_87_C12_1.fastq ~/results/trimmomatic/bicolor_87_C12_1_trimmed.fastq ILLUMINACLIP:myAdapters.fa:2:30:10 TRAILING:28 LEADING:28 MINLEN:30

tophat2

tophat2 -o ~/results/align/87C12_1_align_ERtophat --read-mismatches 0 --read-gap-length 0 --read-edit-dist 0 --max-insertion-length 0 --min-intron-length 25 --num-threads 16 --max-multihits 10 --microexon-search --library-type fr-firststrand --b2-very-sensitive --GTF ~/PhytozomeV11/Sbicolor/annotation/Sbicolor_313_v3.1.gene_exons.gff3 Sbicolorv3 ~/results/trimmomatic/bicolor_87_C12_1_trimmed.fastq

samtools

samtools sort -n ~/results/align/87C12_1_align/accepted_hits.bam ~/results/align/87C12_1_align/87C12_1_ER_sorted

htseq

htseq-count -q -f bam -m intersection-nonempty -s no -t gene -i ID ~/results/align/87C12_1_align/87C12_1_ER_sorted.bam ~/PhytozomeV11/Sbicolor/annotation/Sbicolor_313_v3.1.gene_exons.gff3 > ~/results/Counts/COUNToutputfile_87_C12_1.txt

RNA-Seq alignment rna-seq • 2.0k views

ADD COMMENT • link 6.9 years ago by ysbaek • 0

2

Entering edit mode

my multiple alignment % is 66.2% which is too high to be validated

Have you checked to see if the multi-mappers represent rRNA reads?

ADD REPLY • link 6.9 years ago by GenoMax 141k

0

Entering edit mode

I thought about rRNA reads but is it necessary to remove rRNAs before aligning?

ADD REPLY • link 6.9 years ago by ysbaek • 0

0

Entering edit mode

No, it's not necessary. But if a large fraction of your reads are rRNA then a lot of multimapping reads is not unexpected.

ADD REPLY • link 6.9 years ago by WouterDeCoster 47k

1

Entering edit mode

With regard to all parameters you specified for tophat2: unless you know what you are doing it's always better to stick to the defaults.

That said, you should know that the old 'Tuxedo' pipeline of Tophat and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using kallisto or salmon.

ADD REPLY • link 6.9 years ago by WouterDeCoster 47k

0

Entering edit mode

Thank you!!

I was trying to use STAR, but having a hard time to produce genome_index. Do you know where I could find any template scripts of HISAT2 and StringTie and Ballgown. Thank you!

ADD REPLY • link 6.9 years ago by ysbaek • 0

0

Entering edit mode

I was trying to use STAR, but having a hard time to produce genome_index

If I'm not mistaken the manual is fairly clear about this.

Do you know where I could find any template scripts of HISAT2 and StringTie and Ballgown.

The manual, presumably.

Just a detail, but what you call "a script" here, is more commonly called "a command".

ADD REPLY • link 6.9 years ago by WouterDeCoster 47k