Consensus sequence from multiple BAM files on a single position
1
0
Entering edit mode
2.1 years ago

Hi all,

I have several thousands sorted BAM files and I would like to quickly extract the consensus sequence from all the reads on a specific position. Simply using samtools view provides all the reads (including those jumpoing the position due to splicing). I could not make it work via mpileup. Here is the single file call I have been working on:

samtools view -b file.bam "chr2:5666012" | samtools mpileup -


The ideal output for a single file should be (with G consensus):

....AAAACCTT


However, I get huge outputs as also the spliced reads are employed in the pileup generation.

Any idea is welcome. Thanks! :-)

samtools bam mpileup • 459 views
1
Entering edit mode

Hello,

to get the consensus sequence you first have to call variants (mpileup isn't enough) and then create the the consensus sequence with the help of your vcf file and the reference sequence. See here: Generating consensus sequence from bam file

fin swimmer

0
Entering edit mode
2.1 years ago
jkbonfield ▴ 720

I have some old perl one-liner kicking around to do this from the samtools mpileup output, but it doesn't look like it'll handle multiple chromosomes. Likely it was a one-off hack for some task. Anyway maybe it can be adapted. Be warned: it's hideous! :-)

 samtools mpileup in.bam | perl -ane 'BEGIN {$l=0}$F[4]=~s/[-+](\d+)((??{".{$1}"}))|\^.|\$//g;$F[4]=~s/\*/N/g;%C=();foreach(split("",uc($F[4]))) {$C{$_}++} @C=sort {$C{$b}<=>$C{$a}} keys %C;print "N" x ($F[1]-$l-1),$C[0];$l=\$F[1]'