In NGS, after alignment, SAM and BAM files contain reads (paired and unpaired). I wonder when is better to keep the unpaired (orphaned) reads for the downstream analysis and when should be kept?
e.g. in RNA-seq, exome-seq, variant calling, structural variantion, methylation, Chip-seq... whatever comes through your mind is highly appreciated.
If your answer is supported by a paper or experience would definitely be great.
Thanks in advance
Thanks. How about if I prove that it hurts.. surely I'd be happy to correct me if I am wrong. So e.g. orphan reads changes DP value for SNPs and INDELs in VCF files, please have a look at my other post "A: Why GATK and bcftools SNP calling different? ". Then when we filter the SNPs based on DP value, it will eliminate a number of SNPs. Hope that clear.
From your first part, I feel like there is a way that one can get rid of orphan reads during/before the alignment. Is there?
If you have evidence that in a particular use-case including them produces lower quality results then certainly leave them out. That two different tools happen to treat them differently is neither a surprise nor a problem. This just means that the
DPthreshold should be different if you use samtools versus GATK.
BTW, we were using different defintions of orphan reads. I assumed you meant those whose mates were removed during the trimming process. You meant what are commonly referred to as "singletons" (i.e., paired-end reads whose mates don't align). The former can either be excluded from or included in the alignment. The latter would need to be filtered out afterward if they cause a problem (this is unusual, though).
OK. Thanks. So those reads who lose their mates through trimming might be better to stay, right? Those reads whose mates are not aligned better to be removed?
By the way, how one can separate Singletons from Orphaned reads? Perhaps during trimming(trimmomatic), Orphaned reads can be separated then after alignment if any single reads found, they are Singletons, can I be correct?
Devon Ryan Hi, sorry to reply on such an old post but I have a similar question. The kneaddata output has unmatched_1.fq and unmatched_2.fq which are reads whose mates are lost but they themselves passed both trimmomatic and bowtie2 step. In this case would at what step would have reads without mates be an issue in downstream processing? Thanks in advance
You have to align them separately, which is a minor annoyance.