Fetch pairs of discordantly aligned reads
2
0
Entering edit mode
4.6 years ago

I am trying to get discordant aligned reads from a bam file especially reads that are aligned to two different chromosomes/references. I tried samtools view to fetch reads aligned to one chromosome but this does not pull/include the second read which might have aligned to different chromosome. I am looking for a solution to achieve it. Thanks for your help and time in advance. Sorry if I missed any already posted solution.

alignment genome • 1.6k views
1
Entering edit mode
4.6 years ago

using samjdk: http://lindenb.github.io/jvarkit/SamJdk.html

java -jar dist/samjdk.jar -e 'return record.getReadPairedFlag() && !record.getReadUnmappedFlag() && !record.getMateUnmappedFlag() && !record.getReferenceName().equals(record.getMateReferenceName());' in.bam ADD COMMENT 0 Entering edit mode Thanks Pierre, Can I specify reference/chromosome of my first read, such that I can fetch one read aligned to a specific chromosome of the discordant pair and the other anywhere else? ADD REPLY 0 Entering edit mode sure do something like (not checked): java -jar dist/samjdk.jar -e 'if(!record.getReadPairedFlag() || record.getReadUnmappedFlag() || record.getMateUnmappedFlag()) return false; final String c1= record.getReferenceName(); final String c2=record.getMateReferenceName(); return (c1.equals("chr4") && c2.equals("chr3")) || (c1.equals("chr3") && c2.equals("chr4"));' in.bam

0
Entering edit mode

Thanks again, Will try it and leave a note on how did it go.

0
Entering edit mode
4.6 years ago
d-cameron ★ 2.6k

I am looking for a solution to achieve it.

What is the purpose of this analysis? Are you attempting to identify structural variations? If so, why not use an actual SV caller (such as manta or GRIDSS), instead of effectively creating a simple one yourself?

I am trying to get discordant aligned reads from a bam file especially reads that are aligned to two different chromosomes/references.

Some aligners consider reads concordantly aligned if they are on the same chromosome in the expected orentiation _regardless of the inferred fragment size_. That is, two reads aligning 100Mb apart can be considered concordant. Do you want to extract these reads?

My SV caller GRIDSS performs some pre-processing, which includes creating a BAM file that contains only reads indicating the presence of some sort of SV. The gridss.ExtractSVReads utility program has parameters to choose whether to extract discordant read pairs (based on the actual fragment size distribution), split reads, soft clipped reads, one end anchored reads (one read maps, the other doesn't), and unmapped read pairs. All of these can provide support for a putative structural variation.

For inter-chromsomal rearrangements, the gridss.ComputeSamTags utility program can be useful as it ensures the optional read mate tags are populated, something which many aligners don't bother doing. By default, it also populates the R2 and Q2 tags although, depending on what you're doing, these might not be useful to you.

If this is sounding like it could be too much effort, I recommend running GRIDSS - it also performs assembly of the aforementioned reads and can give you both variant calls, supporting assembly contigs, as well as a breakdown of the supporting reads (e.g. 4 discordant read pairs, 2 split reads, 1 assembly contig, 1 one end anchored read pair, 4 soft-clipped reads).