How to specify/calculate subject coverage of the alignment (alignment of query[protein sequence] and subject[Nucleotide sequence]) in tBLASTn?
0
0
Entering edit mode
5.3 years ago
Kumar ▴ 120

I have been extracting the homologous virulence gene sequences from my genome of interest by doing tBLASTn, wherein I have to specify two mandatory parameters,

Coverage = 40% and Percentage of identity = 70%

Percentage of identity can be specified/derived from the tabular out put format (-outfmt 6). Since tBLASTn analyse, protein against nucleotide hits, therefore how to calculate the coverage percentage of the same.

If I manually calculate the coverage percentage, its so tedious (3 nucleotide to 1 amino acid formula) process. Moreover, I have to carry out this analysis for numerous genomes. Hence, please suggest me any strategy to do the same.

Thank you in advance.

alignment BLAST tBLASTn • 3.2k views
ADD COMMENT
0
Entering edit mode

you can specify the qcovs (or qcovhsp) in outfmt 6 as well. Would that be of any use? Keep in mind these are always in reference to the query , not the hit sequence

ADD REPLY
0
Entering edit mode

Dear sterck, I feel qcovs is not useful in this regard. Suppose I have a query (Protein sequence) length 66 (total query sequence length is 300) and my subject hit (includes gaps) length is 198 (total subject length is 1200), like this if i calculate it takes much time to finish and also its error prone. Therefore, please suggest me any script or easy method to do the same.

ADD REPLY
0
Entering edit mode

OK, you kinda lost me here.

What is the difference between the protein length of 66 AA (?) and what you say the total length is 300 AA/NT ? I somewhat understand the difference for the hit sequence.

To get things straight: what do you consider as the coverage percentage? Remember that blast only works on what can be aligned (HSPs) and thus rarely on the full sequences.

ADD REPLY
0
Entering edit mode

I mean to say that, 66 amino acid sequences has been matched with 198 nucleotide sequence including gaps. In this case how could I calculate my query percentage? I have another doubt, what is the meaning reference to the query (As you mentioned in your earlier reply).

ADD REPLY
1
Entering edit mode

No, a 66 amino acid stretch hasn't been matched against a 198 nucleotide stretch.

The 66 amino acid query is matched against another protein in the database. It simply then returns the corresponding gene sequence for the matched protein.

Why not simply use the qcov of the protein against protein comparison as Lieven suggested? Covering 50% of the protein sequence is the same as covering 50% of the gene sequence.

ADD REPLY
0
Entering edit mode

well, that means that if blast says its qcov 50, it tells you that 50% pf the query is covered by a match, not that you match 50% of the hit as well . Eg. 50% of the query can be matched by only 10% of the hit sequence

ADD REPLY
0
Entering edit mode

tBLASTn is just converting between nucleotide and protein formats before running the query anyway, so you can use the same approach you would use for protein - protein or nucleotide - nucleotide.

ADD REPLY
0
Entering edit mode

Dear healey, As you said I can do it. But I have plenty of out put to be calculated. Please suggest me any easy way to do the same.

ADD REPLY
0
Entering edit mode

You will need a code wrapper to take your blast result and recalculate this yourself. BLAST cannot give you that number straight away AFAIK.

ADD REPLY
0
Entering edit mode

Thank you healey for your clarification. However, I do not know how to move further in this cause.

ADD REPLY

Login before adding your answer.

Traffic: 1021 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6