Question

SAM File target extraction

0

Entering edit mode

11.0 years ago

geneart$$ ▴ 50

Hi all,

Is there a way to extract the corresponding target sequence from an bam or SAM alignment produced by novoalign software? or bowtie?

Thanks

Geneart.

alignment • 3.5k views

ADD COMMENT • link updated 3.6 years ago by Ram 45k • written 11.0 years ago by geneart$$ ▴ 50

1

Entering edit mode

Duplicate of Extract Reads From A Bam File That Fall Within A Given Region

ADD REPLY • link 11.0 years ago by Ashutosh Pandey 12k

Ram · Answer 1 · 2014-07-28

0

Entering edit mode

11.0 years ago

Ming Tommy Tang ★ 4.7k

#index the bam file first
samtools index test.bam
samtools view test.bam chr1:200000-500000

or have a look at tabix (for sam file) http://samtools.sourceforge.net/tabix.shtml

ADD COMMENT • link updated 5.8 years ago by Ram 45k • written 11.0 years ago by Ming Tommy Tang ★ 4.7k

0

Entering edit mode

Hi tangming2005

Thanks for the reply . I was looking into tabix earlier, the reason I posted this question is because, I am not really interested in the coordinates of the reference region where my sequence maps to, but I would like to retrieve the corresponding sequence itself ( string of ATGCs) that my reads maps onto. SAM files does give the number of mismatch/matches of our reads to the refrence sequence but I was looking to extract the actual reference sequence region where my read maps to. SO it is slightly different. But then very much appreciate your reply :) Thanks again,

I guess I can still take the coordinates generated this way and extract sequence from my genome file perhaps?

Geneart.

ADD REPLY • link 11.0 years ago by geneart$$ ▴ 50

0

Entering edit mode

There are many sequences that will represent your region of interest but if you want to get a single consensus sequence, then you should read more about pileup2fq. The old pileup feature in samtools could create one for you. You can do the same with new mpileup but there is no pileup2fq like feature.

ADD REPLY • link updated 3.6 years ago by Ram 45k • written 11.0 years ago by Ashutosh Pandey 12k

0

Entering edit mode

sure you can get the coordinates and convert to fasta sequences. See one of my post here

ADD REPLY • link updated 5.8 years ago by Ram 45k • written 11.0 years ago by Ming Tommy Tang ★ 4.7k

1

Entering edit mode

He is not talking about extracting sequence from reference fasta file. He has a bam file and it may have a lot of variants. He wants to build a fasta sequence that represent the sequence with variants in them. BTW, I just checked your post and you have not mentioned about samtools faidx as a solution.

ADD REPLY • link updated 3.6 years ago by Ram 45k • written 11.0 years ago by Ashutosh Pandey 12k