How can I use samtools to identify discordantly paired reads or is it even possible? I can use awk to simply output reads which are mapped on different chromosomes, and also reads where the distance between the pairs is significantly larger than expected but is there a way to do this via samtools?
There are different options:
- You can use
samtools and give as argument the following flag '-F 1294'. In this way you'll discard all reads having which are: read mapped in proper pair, read unmapped,mate unmapped,not primary alignment,read is PCR or optical duplicate.
- As you've mentioned, you can use
awk to extract mates mapped on different chr. You should convert your bam to sam, and then
awk '($3!=$7 && $7!="=")' .
I would vote for the first approach, because it includes the distance between pairs issue.
using samjdk: http://lindenb.github.io/jvarkit/SamJdk.html
$ java -jar dist/samjdk.jar -e 'return record.getReadPairedFlag() && !record.getReadUnmappedFlag() && !record.getMateUnmappedFlag() && !record.getReferenceName().equals(record.getMateReferenceName());' in.bam