Question: How to extract sequence message from SAM?
gravatar for Jeason Rad
3.7 years ago by
Jeason Rad30
Jeason Rad30 wrote:

Hi, all. I have a PE-alignment SAM file, how can I extract the pos/start/end/strand/sequence from each flagment? I used this command-line to extract pos/start/end/strand,

bedtools bamtobed -i reads.bam -bedpe | awk -v OFS="\t" '{if($9=="+"){print $1,$2,$6}else if($9=="-"){print $1,$2,$6}}' > fragments.bed

but how to put the sequence message into it?

sequencing sam bed • 825 views
ADD COMMENTlink modified 2.3 years ago by Biostar ♦♦ 20 • written 3.7 years ago by Jeason Rad30
cut -f 10 file.sam

can be used to extract sequence, but not suit for pair-end sequence.

chr16   85790994        85791072        +
chr8    17447157        17447198        -
chr1    39340268        39340412        +
chr2    227662155       227662242       +
chr12   93416078        93416128        +

This is the bed file from bamtobed. How can I extract the sequence between col2 & col3 from my BAM file? Any suggestions will be grateful!

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by Jeason Rad30
gravatar for Pierre Lindenbaum
3.7 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum133k wrote:

pipe your output of bamtobed into a loop with samtools faidx:

 echo -e "ref\t6\t22\t+\nref\t8\t18\t+" |  | while IFS=$'\t' read -a B; do echo -ne "${B[0]}\t${B[1]}\t${B[2]}\t${B[3]}\t" &&  samtools faidx reference.fa "${B[0]}:${B[1]}-${B[2]}" | grep -v ">" | tr -d "\n" && echo ; done

note: for minus strand, I haven't rev-complemented the sequence. Furthermore , I'm lazy , as the input is BED, there will be +1 shift for the first base of the DNA.

ADD COMMENTlink written 3.7 years ago by Pierre Lindenbaum133k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1651 users visited in the last hour