Question

Extracting reads from given region

0

Entering edit mode

6.4 years ago

KVC_bioinfo ▴ 590

Hello all,

I am extracting the reads from bam file which are falling in particular region using the following command.

samtools view -h input.bam "Chr10:18000-45500" > output.bam

However, now I want to extract the reads in sam manner but my reference is human genome primary assembly where the name of fasta reads starts like:

NC_000001.10 Homo sapiens chromosome 1, GRCh37.p13 Primary Assembly

How do I now modify command to do the sam task?

Thank you in advance.

bam • 2.5k views

ADD COMMENT • link updated 6.4 years ago by Pierre Lindenbaum 161k • written 6.4 years ago by KVC_bioinfo ▴ 590

score 3 · Answer 1 · 2017-12-03

3

Entering edit mode

6.4 years ago

Pierre Lindenbaum 161k

the fasta should be indexed ignoring everything after the first whitespace. So your command should be something like:

samtools view -h input.bam "NC_000001.10:123-456" > output.sam

ADD COMMENT • link 6.4 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Thank you. However, I want to extract reads from Chr 10 with the given coordinates above.

ADD REPLY • link 6.4 years ago by KVC_bioinfo ▴ 590

1

Entering edit mode

try looking for some thing like: NC_000010 in header:) $ grep ">" <input.fasta> (if your fasta is gizpped, use zgrep instead of grep). You would get all the headers in reference file. Then look for an entry that starts with NC_000010. Use that to extract the information. Command would look like:

samtools view -h input.bam "NC_000010.xx:123-456" > output.sam

ADD REPLY • link 6.4 years ago by cpad0112 21k