Question: HISAT - Stringtie
1
gravatar for AW
2.8 years ago by
AW350
United Kingdom
AW350 wrote:

I want to use the HISAT-Stringtie approach to quantify expression for my paired-end Illumina RNA-seq data.

I see that by default HISAT reports up to 5 alignments for each read. “Default mode: search for one or more alignments, report each. k -5”

If there are multiple alignments for a given read in the sam file, how does Stringtie use this to quantify expression? Does this mean that reads can be counted multiple times or does Stringtie somehow pick the best alignment and ignore the others?

I want to avoid the situation where a read is counted multiply therefore how should I filter the sam file to only include one alignment for a read as Tophat used to report? I see I cannot just specify –k 1 as “HISAT does not "find" alignments in any specific order, so for reads that have more than N distinct, valid alignments, HISAT does not gaurantee that the N alignments reported are the best possible in terms of alignment score.” How else should I do this?

I also noticed in the Stringtie manual 'Every spliced read alignment (i.e. an alignment across at least one junction) in the input SAM file must contain the tag XS to indicate the genomic strand that produced the RNA from which the read was sequenced. Alignments produced by TopHat and HISAT2 (when ran with --dta option) already include this tag'

When using HIAST I want to make sure the XS tag is present. However, I cannot see the --dta option?

Thanks!

Alison

rna-seq • 2.5k views
ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by AW350
1

Have you read the StringTie paper?

ADD REPLYlink written 2.8 years ago by Devon Ryan88k
1

Hi, Thanks for your comment! Yes I have read the paper. But it doesn't answer these questions, mainly because in the paper they use the output of Tophat2 as the input of StringTie. Tophat2 only reports one alignment per reads.

ADD REPLYlink written 2.8 years ago by AW350
1
gravatar for AW
2.8 years ago by
AW350
United Kingdom
AW350 wrote:

Found the answer to the second question. I was using HISAT but StringTie developers recommend using HISAT2. In HISAT2 there is an option --dta/--downstream-transcriptome-assembly Report alignments tailored for transcript assemblers including StringTie. With this option, HISAT2 requires longer anchor lengths for de novo discovery of splice sites. This leads to fewer alignments with short-anchors, which helps transcript assemblers improve significantly in computationa and memory usage.

Its still not clear about the multiple alignments reported by HISAT2?

ADD COMMENTlink written 2.8 years ago by AW350
1

Normally for quantification cufflinks used an EM approach, so the presumption is that stringTie follows the same procedure. Whether that's actually the case only the authors can answer.

ADD REPLYlink written 2.8 years ago by Devon Ryan88k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2376 users visited in the last hour