Generate consensus sequence from BAM without ambiguity codes
1
0
Entering edit mode
3.7 years ago
Tonor ▴ 470

To generate a consensus sequence from a BAM file I use the command:

samtools mpileup -uf REF BAM | bcftools call -c | vcfutils.pl vcf2fq > SEQ


samtools/vcfutils gives ambiguity codes when there is a mixed base population at a site, even if the secondary base is quite a low frequency (say 20%) - is there anyway to avoid ambiguity codes? or to increase the frequency at which they would be used? i.e. if I have 60% A and 40% G at a position - i want my consensus to be A.

Also, anyway to get samtools to give ma fasta file instead of fastq - I have a script that converts it - but was wondering if vcfutils can give fasta automatically.

Happy to use alternative to samtools to get a consenus sequence.

sequence • 2.2k views
0
Entering edit mode

Consensus Fixer looks great - thanks very much

1
Entering edit mode
3.7 years ago

You can edit vcfutils.pl to have it output fastas instead of fastqs. You want to change these lines

 print "\@$chr\n"; &v2q_print_str($seq);
print "+\n"; &v2q_print_str($qual);  to  print "\>$chr\n"; &v2q_print_str($seq); #print "+\n"; &v2q_print_str($qual);


Probably best to make a copy of the whole vcf2fq part, which starts with

sub vcf2fq {
my %opts = (d=>3, D=>100000, Q=>10, l=>5);


and ends with

  &v2q_post_process($last_chr, \$seq, \$qual, \@gaps,$opts{l});
}


And change 'sub vcf2fq' to 'sub vcf2fa' Then you'll have both functionalities.

Unfortunately, putting in a filter for when to not output the het letter is trickier.

0
Entering edit mode

Brilliant thanks for that