Question: SAM File target extraction
0
gravatar for geneart$$
6.2 years ago by
geneart$$40
United States
geneart$$40 wrote:

Hi all,

Is there a way to extract the corresponding target sequence from an bam or SAM alignment produced by novoalign software? or bowtie?

Thanks

Geneart.

alignment • 1.9k views
ADD COMMENTlink modified 6.2 years ago by Ming Tang2.6k • written 6.2 years ago by geneart$$40
1

Duplicate of Extract Reads From A Bam File That Fall Within A Given Region

ADD REPLYlink written 6.2 years ago by Ashutosh Pandey12k
0
gravatar for Ming Tang
6.2 years ago by
Ming Tang2.6k
Houston/MD Anderson Cancer Center
Ming Tang2.6k wrote:
#index the bam file first
samtools index test.bam
samtools view test.bam chr1:200000-500000

or have a look at tabix (for sam file) http://samtools.sourceforge.net/tabix.shtml

ADD COMMENTlink modified 12 months ago by RamRS30k • written 6.2 years ago by Ming Tang2.6k

Hi tangming2005

Thanks for the reply . I was looking into tabix earlier, the reason I posted this question is because, I am not really interested in the coordinates of the reference region where my sequence maps to, but I would like to retrieve the corresponding sequence itself ( string of ATGCs) that my reads maps onto. SAM files does give the number of mismatch/matches of our reads to the refrence sequence but I was looking to extract the actual reference sequence region where my read maps to. SO it is slightly different. But then very much appreciate your reply :) Thanks again,

I guess I can still take the coordinates generated this way and extract sequence from my genome file perhaps?

Geneart.

ADD REPLYlink written 6.2 years ago by geneart$$40

There are many sequences that will represent your region of interest but if you want to get a single consensus sequence, then you should read more about pileup2fq. The old pileup feature in samtools could create one for you. You can do the same with new mpileup but there is no pileup2fq like feature. 

ADD REPLYlink written 6.2 years ago by Ashutosh Pandey12k

sure you can get the coordinates and convert to fasta sequences. See one of my post here

ADD REPLYlink modified 12 months ago by RamRS30k • written 6.2 years ago by Ming Tang2.6k
1

He is not talking about extracting sequence from reference fasta file.  He has a  bam file and it may have a lot of variants. He wants to build a fasta sequence that represent the sequence with variants in them.  BTW, i just checked your post and you have not mentioned about samtools faidx as a solution.

ADD REPLYlink modified 6.2 years ago • written 6.2 years ago by Ashutosh Pandey12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1300 users visited in the last hour