Question: Doubt regarding Reference genome alignment
0
gravatar for bic
8 months ago by
bic0
bic0 wrote:

Hello all,

We are working with whole genome sequence data of a Pseudomonas fluorescence strain. The de-novo assembly of the same was performed using abyss software. After that, contigs from denovo assembly was submitted in RAST server for annotation. From RAST, the closely related species to this strain was identified. When we did the alignment between our strain and the related strain using Bowtie2, it shows 71% overall alignment rate. So I want to know whether this alignment rate is good or not.

Just feel that the alignment should be a bit more between two strains of the same species. Not sure if this question is a blunder as we are new to NGS data analysis.

Also Could someone suggest any tool to find out a reference genome other than by BLAST?

Thanks in advance

Regards

Ravisankar

ADD COMMENTlink modified 8 months ago by pbpanigrahi180 • written 8 months ago by bic0
1
gravatar for pbpanigrahi
8 months ago by
pbpanigrahi180
pbpanigrahi180 wrote:

I want to know whether this alignment rate is good or not.

It depends on how much the two strain differ. If the two strains are indeed different, then you expect low alignment rate. This can be checked by allowing 1 mismatch during seed alignment step, whether it improves alignment rate.

From the manual

-N <int> Sets the number of mismatches to allowed in a seed alignment during multiseed alignment. Can be set to 0 or 1. Setting this higher makes alignment slower (often much slower) but increases sensitivity. Default: 0.

After giving -N 1, if alignment rate increases significantly, then you can infer that that the two strains are different at many single nucleotide positions.

ADD COMMENTlink modified 8 months ago • written 8 months ago by pbpanigrahi180

Thanks for the quick reply. Will try it soon.

Also Can someone help me with the second question? Tool to find out reference genome other than BLAST?

ADD REPLYlink written 8 months ago by bic0

I repeated the alignment using -N 1 option and it now gives 77% overall alignment rate. So can we infer r from this result that the two strains are quite different from each other?

Thanks

ADD REPLYlink written 8 months ago by bic0

To check, you can obtain the 30% reads which don't map and try to do blast and see is there any possibility of contamination? You can also explore Fastq_Screen and DeconSeq for contamination detection. UCSC blat is alternate option for BLAST.

Other thing to try is using other aligners like bwa-mem and see whether using different aligner improves.

Also any multi hits you getting? i.e one read mapping to multiple location? Can you post the alignment statistics what you getting.

If everything seems fine, then you may assemble your reference genome and then realign with that to see how much % of reads aligned and them compare the assembled reference genome with the genome you are comparing and check whether the 2 are indeed different or not?

ADD REPLYlink modified 8 months ago • written 8 months ago by pbpanigrahi180
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1302 users visited in the last hour