Question: Aligning Contigs of various Mtb strains to a reference genome and get the variants
0
gravatar for Paul
10 months ago by
Paul80
India
Paul80 wrote:

Hi All,

I have a multiple of Mtb strains in the form of scaffolds.

Is there a way I can align these multiple scaffolds of multiple Mtb strains to a reference genome, to get a variant file with all the SNPs?

I know I can align reads to a reference genome and can get the variant file using samtools. But have no idea of aligning scaffolds or contigs to a reference genome to get all the variants with respect to the reference genome.

gwas mapping variants assembly • 398 views
ADD COMMENTlink modified 10 months ago by k.kathirvel93190 • written 10 months ago by Paul80
1

I think you can consider scaffolds as "Big reads"

ADD REPLYlink written 10 months ago by Titus860

Is there such option in bowtie or bwa?

ADD REPLYlink written 10 months ago by Paul80
1

Yes to me you just have to align it to a "classic" manner. Maybe you have to modify a bit the insertion score if you have big insertion due to the variability of your samples.

ADD REPLYlink written 10 months ago by Titus860

But the format of reads are different from scaffolds, scaffolds are the string of sequences where as read files looks like this

AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEAEEEEE/EEEEEEEEEEEEAEE<EEEEAEEEEEEAEAAEEAAEEEEEAEA/EEAEAAEA<<<AE<A<<
@NS500223:156:HWNKFBGXX:2:13204:6076:12546 1:N:0:TGACCAAT+AGATCTCG
CGCAGCGCCTTGGACAGGGCCAGGCAGTCGTCCATCGTCCGGAAGGCGTGCACGCCCTCGGGCAGCTCCCGGTCCGGGGTGAACAGGCCGCGCAGCGGCGGCAGGACGGGGTTGGAGCCGGTGGCCAGGACGAGCGTGTCGTATGCGATC
+
AAAAAEEEEEEEEEEAEEEEAEEEEEEEEEEEEAEEEEEEEEA/EEEEEEEEEEEEEEEEEEAEEEEEEEEEEA/EEEEEE/AEAEEEEEEEE/EEEE/EA</AEEEEEE<</EE/EAEEEEAA/<<EE/EE/AAEEE/6/A//A<</<6
@NS500223:156:HWNKFBGXX:2:13204:16689:12548 1:N:0:TGACCAAT+AGATCTCG
GAAGCCGACGGCGTAGAGCGCGAAGGCGACGACGACGGCGACCTGGCCGGCCGGGGAGCCGGTCATGCGTTCCAGGGCGCCGTCCTTCACCCCGTTCATGAGGAACAACGAGCCGACGCCGAGGACGGGGACGGCGTACGAGGTCATGCT
ADD REPLYlink written 10 months ago by Paul80
1

You just ha to put your scaffold has fasta format , don't you think ?

ADD REPLYlink written 10 months ago by Titus860

Thanks a lot it worked for me :) I aligned it using Bwa-mem and scanned it using freebayes

ADD REPLYlink modified 10 months ago • written 10 months ago by Paul80

@Titus, However, after the alignment I found that most of the SNPs are in the overlapping regions of the contigs. And these are contigs not scaffolds. Do I need to scaffold them before mapping or continue using contigs for the analysis?

ADD REPLYlink modified 10 months ago • written 10 months ago by Paul80

I'm not sure to understand it correctly, what do you mean by in the overlapping regions of the contigs ? Because you need an overlap to compare them :)

ADD REPLYlink written 10 months ago by Titus860
2
gravatar for k.kathirvel93
10 months ago by
k.kathirvel93190
India
k.kathirvel93190 wrote:

My Suggestion is, why can't you use mauve aligner? It is best to align and view and SNP extraction. You can align few many draft genomes with your reference genome in a single run. All the Best.

ADD COMMENTlink modified 10 months ago • written 10 months ago by k.kathirvel93190

That's a good suggestion, but I have 200 MTb strains to align each with the reference genome.

ADD REPLYlink modified 10 months ago • written 10 months ago by Paul80
1

I think you can write for loop to take all the samples one by one in a order in a single run. I am sure we can go with 10 samples in mauve in a single run, so am asking you to read mauve manual carefully. All the Best.

ADD REPLYlink written 10 months ago by k.kathirvel93190

Thankyou... will try that..

ADD REPLYlink written 10 months ago by Paul80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1076 users visited in the last hour