I am new to the field of paired-end reads analysis, and would appreciate your feedback regarding several issues in analysis of paired-end reads alignment data:
In general, how come paired-end reads can be mapped to different chromosomes? While processing the FASTQ file using paired-end command, doesn't the aligner map a reads pair only if the two reads point toward each other on opposite strands (one aligned to the forward strand and the second to the reverse strand) in a known distance from one another? Or the aligner is just looking for the best fit for each read, which could result in mapping to different chromosomes, regardless of the fact that a pair of reads should in theory represent a genomic sequence from just one chromosome?
Following the alignment, how can I keep only the uniquely mapped reads, that are on the same chromosome? I know that samtools flagstat reports the number of pairs mapped to different chromosomes, but how can I extract only the pairs that are mapped to the same chromosome, and making sure that there are uniquely mapped? Are these just the reads tagged as 'Properly paired' (and if so how can I keep only them?)