I think, I am missing something about how mapping tools work. I am mapping reads using BWA.
My genome file (reference file in fasta) -
>chr1 AACCCCCCCCTCCCCCCGCTTCTGGCCACAGCACTTAAACACATCTCTGC CAAACCCCAAAAACAAAGAACCCTAACACCAGCCTAACCAGATTTCAAAT TTTATCTTTAGGCGGTATGCACTTTTAACAAAAAANNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN GCCCATCCTACCCAGCACACACACACCGCTGCTAACCCCATACCCCGAAC CAACCAAACCCCAAAGACACCCCCCACAGTTTATGTAGCTTACCTCNNNN >chrMm GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCAT TTGGTATTTTCGTCTGGGGGGTGTGCACGCGATAGCATTGCGAGACGCTG GAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATT CTATTATTTATCGCACCTACGTTCAATATTACAGGCGAACATACCTACTA AAGTGTGTTAATTAATTAATGCTTGTAGGACATAATAATAACAATTGAAT GTCTGCACAGCCGCTTTCCACACAGACATCATAACAAAANAATTTCCACCMy reads are in fasta format -
>Read#BetweenN's TTTTAACAAAAAAGCCCATCCTACCCAGC >VIK CAACCAAACCCCAAAGACACCCCCCACAG >VIK#Inverse GACACCCCCCACAGAAACCCCAAACCAAC >VIK#COMPLIMENTARY GTTGGTTTGGGGTTTCTGTGGGGGGTGTC >VIK#Inverse_of_COMPLIMENTARY CTGTGGGGGGTGTCTTTGGGGTTTGGTTG
Commands for BWA -
bwa index -a is genome bwa aln genome reads > out.sai bwa samse genome out.sai reads > out.sam
The output from BWA is (sam file) -
@SQ SN:chr1 LN:300 @SQ SN:chrMm LN:300 Read#BetweenN's 4 * 0 0 * * 0 0 TTTTAACAAAAAAGCCCATCCTACCCAGC * VIK 0 chr1 251 37 29M * 0 0 CAACCAAACCCCAAAGACACCCCCCACAG * XT:A:U NM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:29 VIK#Inverse 4 * 0 0 * * 0 0 GACACCCCCCACAGAAACCCCAAACCAAC * VIK#COMPLIMENTARY 4 * 0 0 * * 0 0 GTTGGTTTGGGGTTTCTGTGGGGGGTGTC * VIK#Inverse_of_COMPLIMENTARY 16 chr1 251 37 29M * 0 0 CAACCAAACCCCAAAGACACCCCCCACAG * XT:A:U NM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:29
So my first read did not mapped because half of it lies on left side of N's and half on right side of N's on chr1 (so no problem with this read). Second read mapped to forward strand (good).Fifth read (inverse of complimentary of second read) mapped to same position but on reverse strand (very good).
Question- Third read(VIK#Inverse) is inverse of second read. So if you will read the reference genome from right hand side at last line of chr1, it is the same sequence but it did not get mapped. So, is there any option with which I can map my read like this? If not, then why it is not a good idea to map reads like this? (same goes for the fourth read but according to reverse strand)