Generate consensus sequence from BAM without ambiguity codes
1
0
Entering edit mode
6.2 years ago
Tonor ▴ 480

To generate a consensus sequence from a BAM file I use the command:

samtools mpileup -uf REF BAM | bcftools call -c | vcfutils.pl vcf2fq > SEQ

samtools/vcfutils gives ambiguity codes when there is a mixed base population at a site, even if the secondary base is quite a low frequency (say 20%) - is there anyway to avoid ambiguity codes? or to increase the frequency at which they would be used? i.e. if I have 60% A and 40% G at a position - i want my consensus to be A.

Also, anyway to get samtools to give ma fasta file instead of fastq - I have a script that converts it - but was wondering if vcfutils can give fasta automatically.

Happy to use alternative to samtools to get a consenus sequence.

sequence • 3.0k views
ADD COMMENT
0
Entering edit mode

Consensus Fixer looks great - thanks very much

ADD REPLY
1
Entering edit mode
6.2 years ago

You can edit vcfutils.pl to have it output fastas instead of fastqs. You want to change these lines

 print "\@$chr\n"; &v2q_print_str($seq);
 print "+\n"; &v2q_print_str($qual);

to

 print "\>$chr\n"; &v2q_print_str($seq);
 #print "+\n"; &v2q_print_str($qual);

Probably best to make a copy of the whole vcf2fq part, which starts with

sub vcf2fq {
  my %opts = (d=>3, D=>100000, Q=>10, l=>5);

and ends with

  &v2q_post_process($last_chr, \$seq, \$qual, \@gaps, $opts{l});
}

And change 'sub vcf2fq' to 'sub vcf2fa' Then you'll have both functionalities.

Unfortunately, putting in a filter for when to not output the het letter is trickier.

ADD COMMENT
0
Entering edit mode

Brilliant thanks for that

ADD REPLY

Login before adding your answer.

Traffic: 2148 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6