Consensus sequence from multiple BAM files on a single position
1
0
Entering edit mode
4.6 years ago

Hi all,

I have several thousands sorted BAM files and I would like to quickly extract the consensus sequence from all the reads on a specific position. Simply using samtools view provides all the reads (including those jumpoing the position due to splicing). I could not make it work via mpileup. Here is the single file call I have been working on:

samtools view -b file.bam "chr2:5666012" | samtools mpileup -

The ideal output for a single file should be (with G consensus):

....AAAACCTT

However, I get huge outputs as also the spliced reads are employed in the pileup generation.

Any idea is welcome. Thanks! :-)

samtools bam mpileup • 975 views
ADD COMMENT
1
Entering edit mode

Hello,

to get the consensus sequence you first have to call variants (mpileup isn't enough) and then create the the consensus sequence with the help of your vcf file and the reference sequence. See here: Generating consensus sequence from bam file

fin swimmer

ADD REPLY
0
Entering edit mode
4.6 years ago
jkbonfield ★ 1.2k

I have some old perl one-liner kicking around to do this from the samtools mpileup output, but it doesn't look like it'll handle multiple chromosomes. Likely it was a one-off hack for some task. Anyway maybe it can be adapted. Be warned: it's hideous! :-)

 samtools mpileup in.bam | perl -ane 'BEGIN {$l=0} $F[4]=~s/[-+](\d+)((??{".{$1}"}))|\^.|\$//g;$F[4]=~s/\*/N/g;%C=();foreach(split("",uc($F[4]))) {$C{$_}++} @C=sort {$C{$b}<=>$C{$a}} keys %C;print "N" x ($F[1]-$l-1),$C[0];$l=$F[1]'
ADD COMMENT

Login before adding your answer.

Traffic: 2764 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6