Extract reads from bam based on start coordinate
1
1
Entering edit mode
7.3 years ago
fizoca ▴ 10

Hi,

I'm trying to extract reads based on their start coordinate in a bam file. I've tried using samtools view but that seems to give all reads covering that region, not originating there.

Apologies if this has been asked elsewhere but I couldn't find an answer via Google.

Many thanks.

alignment • 5.3k views
ADD COMMENT
0
Entering edit mode

For me the question is not clear , I will assume that you have list of coordinate that you want to extract in such case it will be like this

samtools view -b -L ROI.bed file.bam > ROI_file.bam

where region of interest ROI.bed will contains the start position you are interested in

otherwise give example to what you want to extract

ADD REPLY
0
Entering edit mode

Sorry I'll try to explain myself better. From my understanding if I run the command:

samtools view -h -q 10 in.bam  "chr1:1000-1500" > output.bam

output.bam will contain all reads that fall between 1000-1500. I want to extract reads that start at any coordinate within this region. So for example a read that started at 1500 would also be extracted, even if most of the read does not lie within this region.

Apologies if I'm not correctly understanding how samtools view works!

ADD REPLY
0
Entering edit mode

clear now thanks :)

ADD REPLY
0
Entering edit mode

If your ultimate goal is to get the number of 5' position of reads at each genomic position, then you can try bedtools genomecov -5.

EDIT : it is not a very clear question indeed.

ADD REPLY
3
Entering edit mode
7.3 years ago

Something like:

samtools view -h -q 10 in.bam chr1:1000-1500 | awk 'BEGIN{OFS="\t"}{if($1 ~ @) {print} else {if($4 >= 1000 || $4 <=1500) {print} else {}}}' | samtools view -Sbo output.bam -

One could restructure this to get around the default print action in awk, but this will also get the job done.

ADD COMMENT
0
Entering edit mode

Thanks very much for the reply. I'm getting a couple of syntax errors when I try this if you wouldn't mind taking a look? I'm not great with awk.

awk: cmd. line:1: BEGIN{OFS="\t"}{if($1 ~ @) {print} else {if($4 >= 99795689 || $4 <=99795700) {print} else {}}}
awk: cmd. line:1:                          ^ syntax error
awk: cmd. line:1: BEGIN{OFS="\t"}{if($1 ~ @) {print} else {if($4 >= 99795689 || $4 <=99795700) {print} else {}}}
awk: cmd. line:1:                                    ^ syntax error
ADD REPLY
0
Entering edit mode

$1 ~ @ should probably be something like $1 ~ /^"@"/.

ADD REPLY
0
Entering edit mode

Brilliant, thanks again for your help! All working now.

ADD REPLY

Login before adding your answer.

Traffic: 1892 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6