Extract paired-end reads with one end falls within a given region from a bam file
2
0
Entering edit mode
6 months ago
Wang Cong ▴ 10

I have paired-end sequencing reads and two ends of a read may not come from the same region. I want to extract those that have at least one end mapping on a certain region of the genome.

I have found the following command. But it seems that this command does not extract the read whose one end falls within the region while the other one does not.

samtools view input.bam "Chr10:18000-45500" > output.bam

paired-end samtools • 619 views
1
Entering edit mode

I think bedtools pairtobed can do what you want.

0
Entering edit mode
6 months ago

using samjdk http://lindenb.github.io/jvarkit/SamJdk.html to get the read or it's mate in "RF01:67-68"

 java -jar dist/samjdk.jar -e 'final String contig="RF01"; final int start = 67; final int end = 68; if(!record.getReadUnmappedFlag() && record.getContig().equals(contig) && !(record.getStart() > end || record.getEnd() < start)) { return true; } if(record.getReadPairedFlag() && !record.getMateUnmappedFlag() && record.getMateReferenceName().equals(contig)) { final int mstart = record.getMateAlignmentStart(); final int mend = SAMUtils.getMateCigar(record)!=null?SAMUtils.getMateAlignmentEnd(record):mstart; return !(mstart > end || mend < start); } return false;'  in.bam


EDIT:

0
Entering edit mode
6 months ago
yhoogstrate ▴ 140

If it is about also extracting the corresponding mates, I think I have written an utility for that. Though it was designed for finding / subsetting fusion reads.

It works like this:

dr-disco bam-extract 'chr1:123-456' 'chr7:123-456' input.bam sorted.output.bam

That way it reports all reads and all their aligned segments and mates, that have at least one mate aligned in any of these regions and thus all fusion candidates. However, if you just use the same location twice, you will have the equivalent of what you need:

dr-disco bam-extract -o 'chr10:18000-45500' 'chr10:18000-45500' input.bam output.bam

https://github.com/yhoogstrate/dr-disco