Align 16S sequence to a reference
0
1
Entering edit mode
2.1 years ago
A_heath ▴ 160

Hi all,

I have the 16S rRNA sequencing of a bacterial sample and I would like to align that sequencing to its reference (from its genome) and obtain a % of identity.

What you be the best tool to achieve that in your opinion?

I thought about Clustal, or if you know a more appropriate tool, please let me know!

Thank you for your help,

16S sequence alignment • 1.6k views
ADD COMMENT
1
Entering edit mode

Isn't % identity a vague criterion? Is the sequencing data from the same strain? Is so it should map at 100% identity. If you have fastq data then you can use an NGS aligner to identify reads that align. You can then extract those in fasta format and use clustal to get a MSA.

ADD REPLY
1
Entering edit mode

Thank you very much for your reply. The sequencing data is from the same strain but I would like to find an alignment tool that provides a % of identity (that is close to 100% indeed). I have both sequences in fasta formats. Using clustal, I do not have a % of identity value unfortunately

ADD REPLY
1
Entering edit mode

You are looking for % identity for each read or entire alignment?

ADD REPLY
1
Entering edit mode

I am only looking for a % of identity for the entire alignment of the query and reference fasta sequences. I also tried Needle (https://www.ebi.ac.uk/Tools/psa/emboss_needle/)

ADD REPLY
2
Entering edit mode

You can just blast them with Needleman-Wunsch Global Align. (If the deep link doesn't work for you, scroll down on the main landing page and click on Global Align).

PS: Why did Needle not suit your needs? It does exactly the same thing using the same algorithm...

ADD REPLY
1
Entering edit mode

Thank you very much for your help.

The thing is that when these algorithms compare the two nucleotide sequences I have, they do not consider similarity between an A nucleotide and a D (which is either A, T, or G). Thus I have a % of similarity that is 99% instead of 100%. What do you think about it? Should I consider that A --> D is not similar?

ADD REPLY
2
Entering edit mode

Technically, if you run needle on the command line, you can provide your own scoring matrix with the -datafile parameter. So you can give an A-D pairing the same score as an A-A, if you like. The default scoring matrix is shown here and does penalize an A-D pairing, but not as much as a clear mismatch.

Which interpretation is more accurate with regard to your biological problem is a different story. Intuitively, my main consideration would be, if D is in the reference or your query? If it is part of the query, I'd count it as match (like a Regex, where ADT would match AAT, AGT and ATT equally well), whereas I would maybe not consider the position for scoring if D is in the reference and A is in your query?

ADD REPLY
2
Entering edit mode

Thank you very much for these really helpful information.

I run needle from the command line so I might modify the scoring matrix. In my reference sequence I only have A, T, G, or C nucleotides. However, in my query sequence (as the sequencing has inaccuracies at some positions), I sometimes have D, K, R, etc.

ADD REPLY

Login before adding your answer.

Traffic: 1780 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6