Question: Using Hisat2 with strand specific bacteria sequences
0
gravatar for vm.higareda
3.3 years ago by
vm.higareda20
vm.higareda20 wrote:

Hello

I have a transcriptomic data from a illumina single end, strand specific experiment, it is a meta transcriptomic sample. Actually I am analizing the transcripts of bacteria.

I am using bowtie2 to map the sequences to a reference genome, but they are strand-specific, and Bowtie2 does not have that option. That is why I want to use hisat2 in order to select the strand specific option.

As hisat2 considers splicing and bacterias do not have splicing. Do you recommend to use the option --no-spliced-alignment in order to do the aligment?

rna-seq alignment forum • 2.2k views
ADD COMMENTlink modified 3.3 years ago by Brian Bushnell17k • written 3.3 years ago by vm.higareda20
1
gravatar for Istvan Albert
3.3 years ago by
Istvan Albert ♦♦ 81k
University Park, USA
Istvan Albert ♦♦ 81k wrote:

I would say that you should not need to worry about getting reads mapped as spliced if there is no actual splicing taking place in the data.

So there is no need to pass that option.

ADD COMMENTlink written 3.3 years ago by Istvan Albert ♦♦ 81k
1
gravatar for Brian Bushnell
3.3 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

Bacterial RNA-seq data does have a little splicing, it's just rare and typically short, due to self-splicing transcripts. I think it probably does not matter a lot for the most part, but better safe than sorry... so, I'd suggest allowing spliced alignments, but restricting the maximum splice length to something fairly short, like 50bp; I'm not sure what the actual range is of self-splicing, but I seem to recall it's generally around 20bp.

Whether the library is strand-specific is important to analysis, not mapping. Unless you are using the Tuxedo pipeline for analysis (which I do not recommend), there's no particular reason to let strand-specificity impact your choice of aligner.

ADD COMMENTlink written 3.3 years ago by Brian Bushnell17k

Why do you do not recommend the tuxedo pipeline, I am new in this topic, What others programs can use?

Thank you

ADD REPLYlink written 3.2 years ago by vm.higareda20
1

Tuxedo pipeline is one of the older examples of RNAseq data analysis programs and is deprecated to some extent. Developers of tuxedo now recommend that you use HISAT2 which is the new program they wrote.

Since the original question was about bacterial RNAseq one does not need to worry about splicing over long distance so pretty much any NGS aligner could be used for alignment. BBMap, bwa, bowtie2 (and many others) are examples of this type of programs.

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by genomax75k

There's an interesting caveat I just discovered. Apparently, bwa and bowtie2 do not add NH tag to the resulting SAM/BAM files (that's the flag indicating how many times did this particular read ID was reported mapped - so, simply put, it's important to keep track of the multimapping reads).

So, any type of downstream processing that "rescues" multimappers grinds to a halt here - you're pretty much stuck with only uniquely aligned reads. While that's mostly OK for many genes (bacterial genomes are not very repetitive most of the time), you will completely miss out on perfectly duplicated genes, as well as on many short RNAs. So far I've seen as much as 20% multimapping reads in bacterial RNA-seq, although 2-5% seems to be more common with longer reads.

ADD REPLYlink written 16 months ago by predeus1.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1956 users visited in the last hour