Question: how to set reporting options for RNA-seq reads alignment with hisat2?
0
gravatar for epigene
2.7 years ago by
epigene450
United States
epigene450 wrote:

Hi I'm new to analyzing RNA-seq data. I started with using hisat2 to align RNA-seq reads. I think my main goal is to do differential gene expression analysis comparing multiple control samples vs case samples.

I run through a test run with hisat2 with basically these options: --dta-cufflinks --rna-strandness. I realized that the number of alignments in the bam file is more than the number of reads in the original fastq file. Puzzled by this, I searched around and realize that there is this option -k with a default value of 5. So there could be up to 5 alignments of one read.

-k <int>
It searches for at most <int> distinct, primary alignments for each read. Primary alignments mean alignments whose alignment score is equal or higher than any other alignments.

I think this is the reasons for the number of alignments being more than number of reads.

So I'm curious what would be the ideal -k to use and how this option impact downstream analysis with gene counts etc?

Thanks!

hisat2 rna-seq • 1.2k views
ADD COMMENTlink modified 2.7 years ago by WouterDeCoster38k • written 2.7 years ago by epigene450
1
gravatar for WouterDeCoster
2.7 years ago by
Belgium
WouterDeCoster38k wrote:

Gene counts will commonly ignore multimapping reads. That's a pitty, but a sensible decision since these cannot properly get attributed to a certain gene. However, you can rescue some using the method specified here: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0734-x

Only if you are confident in what you are doing you should change the default values. If you don't know what the ideal value is, the default is properly just fine. If else it wouldn't be the default.

ADD COMMENTlink written 2.7 years ago by WouterDeCoster38k

Thanks for the input. I'm not so confident in what I'm doing yet as I don't have a good understanding of what each step works yet. If downstream analysis ignore multimypping reads, then this option won't affect them later.

ADD REPLYlink written 2.7 years ago by epigene450
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1463 users visited in the last hour