1
0
Entering edit mode
4.7 years ago
KVC_bioinfo ▴ 550

Hello All,

I have sam and bam file from alignment of reads (single-end) obtained from Oxford nanopore technology(human genome as a reference) I want to extract reads that aligned to given coordinates. This should only include reads that aligned and not the clipped reads. I have used following command which gives me reads with clipping:

samtools file.bam "chr1:region1-region2" > file_out.bam

Can someone help me here? Thank you in advance

bam sam • 2.8k views
2
Entering edit mode
4.7 years ago

using samjdk:

samtools view -bu  in.bam "contig:1-100" | \
java -jar dist/samjdk.jar -e 'return record.getReadUnmappedFlag() || record.getCigar()==null || !record.getCigar().isClipped();'

0
Entering edit mode

Thank you. I really appreciate it. But I am absolutely unfamiliar to java.

Is there any other way of doing it?

1
Entering edit mode

sure, you can try awk:

samtools view -h  in.bam "ref:1-100" |\
awk -F '\t' '/^@/ {print;next;} (!($6 ~ /[HS]/ )){print;} '  ADD REPLY 0 Entering edit mode Thanks! what does ref:1-100 mean here? ADD REPLY 1 Entering edit mode it's your "chr1:region1-region2" ADD REPLY 0 Entering edit mode I am using the command you suggested: samtools view -h sorted.bam "chr1:region1-region2" | awk -F '\t' '/^@/ {print;next;} (!($6 ~ /[HS]/ )){print;} '

However, this just gives me names such as:

@HD VN:1.3 SO:coordinate

@SQ SN:chr1 LN:248956422

@SQ SN:chr2 LN:242193529

@SQ SN:chr3 LN:198295559

@SQ SN:chr4 LN:190214555

@SQ SN:chr5 LN:181538259

@SQ SN:chr6 LN:170805979

I want to extract the reads that mapped in that region but I don't want the soft clipped reads. I assume the alignment bam file already excludes the hard clipped dreads.

2
Entering edit mode

-h includes the header. The rest of your reads should be after the header. The important bit in that command is !(\$6 ~ /[HS]/ , which is going to remove any reads where the CIGAR string indicates clipping.

0
Entering edit mode

Thank you very much. I am not getting any reads in resulting file. I am interested in obtaining reads that mapped to the reference where I do not want the clipped reads in it.

Suppose, original read length is 2000bp and only 50 map to the reference rest of the reads are clipped. Then I am interested in those 50 reads. I guess the above command removes entire 2000bp read.

0
Entering edit mode

Also, I am not sure why the header includes all chr when I have specified only chr1 in my command.

1
Entering edit mode

if your region is really "chr1:region1-region2" then, please, try to understand what you're doing...