Hi all,
I am relatively new to bioinformatics and using genomic data. I have generated some ddRAD data using the Petersen et al. 2012 protocol and STACKS to call SNPs (default parameters for -m, -M, and -n). I have filtered sequences to retain 'high-quality' reads. My average read depth coverage per locus is 29X and the average amount of missing data per locus is 7%. Taking a look at my Fasta file I am noticing that for most sequences there is a string of ~10 'N's at around the same location. Here is a subset of sequences.
Locus1: AATTCTTGCAGTGAGAAGTACTGGTGAGTTTCATCCTCATGTTCTGTTCCTATAGTACTGTGGTATACTTATTTTGGATCTTCTTATATGATTAATGGATCNNNNNNNNNNTATTACATACTATTCTGATATCTGTCTGTCGAAATTACAGGCATTTCTGATGAGAAGGTGCATAATCATGAACTCAACCTTAAGGAAGCCGCTTCCACCGG
Locus2: AATTCCTTCGAGTATTGAAGGGATGGCATCTTCATCACCATCATAGAATTTCTGAATCCTCTTCTTCAGCTCTATACGAAAAGCAAAAATCAGTGCAATGNNNNNNNNNNAGTCGAGATTCAACTCCCCAAACAGCCTCATAAGAGAACTAGTAATAAACATTACTTACTGAACTACACTATCGGAAGATTTGTGGCACCGCATAAACCGGACCGTATTTTTCAAAATGGACTTTGACTTATACCGAACCG
Locus3: AATTCAACAACAATTAACGTAGTACTCCATCAAGGTTCCAATCAAAACATCTCTTCTATGTCACTCCATAATAATATAACCTCACTATGCATATCCACTANNNNNNNNNNTCCACCTTGTTATTGCTTGATCCGATTCGGGTCATGAAGAACGGGTTCGGGTCGATATTGATGGCCACTTCTGGTGCTACTTTCGCGGAATGCGGGCCCG
Locus4: AATTCACCTCATTCTTGCGGGTGTTGGGGAGCTATCCTATGGATATGACCCCGTGGTGTCCTTCTAGAGGAGACTAGTAATTAATTAATTTAAAAGTAAANNNNNNNNNNTGAAACTGTGTTTCTTTATTCTTGATTTCACGTTCTTTACCTAAAATACCTACCTACCTGAACTTCTTTCCTCGTGCGGGATTCGAATTGGTGGTGGCCGGAACCAGTACCATAGTCGGTCTGATTCAGTCAAAGTTTCAAGATGTACGCGAACCCGATTTAACCTCCCG
Locus5: AATTCTGATAATCCATTTGTTTGCTCATCTAGTTCTTATGATACAATTATCTGCATCTTTTTCCTTTATACCCGCCGACTTGTTTTCTGCACTAGTAGTGNNNNNNNNNNTAGGTTGGCCATACCCGCACATGCCATAACGACAACGAACATGCCTTGGACATCGATTGAAGCATTTCCCACAATGTCTTCGGTTTCGATTGATGTTCCGGGGGGGCCCTATGAAAATTTGGAAATACGATGCTTTGATAGAGAAAAGGATCCG
Locus6: AATTCAACATAGTTCATGAATCGGGTTATCTATTTTTTACCTGCATGTACCTGGCCAGAACTAAAAAGTCGGTTTCTTGAACCAACTCCAATATCTCTCCTNNNNNNNNNNGGGTCGAGAAGAGGGACTCGGACTACGGATGTCAGGGCATTGAACCTCTTTGCAAGATTAAGGAAGTCCTCATTCATTTGCTCTCGTGGCCATCTAGCCGGTTCGACCCATTTAGTATTCTTGGTCCGACCCGACCCGAATAGGTCGGACCCGATTGAGAAAAACCG
Locus7: AATTCACTGAATGTGTCACGGACTAATAGCATCTATCCAATGATTAGAGGAAATTATTTTAGTTTTTTGGGCAGTGGAAAACTAAAAAATATGGTTTAAANNNNNNNNNNGAAAAGGGGAAAATCTAAATACGAACTTAACGAAAACCCAATACTCTGACAAGGATACCCAATAACCACACATTGTAAAGCAAAAACATGAAGTCAACCGG
Locus8: AATTCTGACAGGAATCATAATGGGATGTGCGCTTTTATTCATGACTCCGTTATTTGAGTATATACCATTGGTATGCTGCAGGCATTTATTTCTTTGATATNNNNNNNNNNAAATTACTTTCCCCCATAATTACGTGTGAGGTGATGCTTACCTTCTAAGACCATAAATAAAATCATCCAAAACCCCTCAACCTGGGATTGGCTGTGGCCGGTCCGGTTCGGTTTTCATTTAAAAATCTGTTCTGTAAATTTCTGTCCGGTCCG
Locus9: AATTCAATTATTGATAGGTTCCAGTAAACTGTATTATTAGTAAGCTAACAGAAGCAGTTGGCGTCAAGATCCCATGAAATAAGTTAGAAGAATCATCATTNNNNNNNNNNTTAGTTCTAGCTACGAATGAATGGAAAAGAGCAGATGGATCAAGAAAATTAAACATTTCCTGGGAAATCCCCATCTGTTAATAGGAGGAGGAGAAGGTTACCGGAGGGAGCTGATTATTCATCTCCTTCTCACCTTCCGTGAATAGCCG
Locus10: AATTCAGAAAAGGAGAGGGACAAATGCTGAAATCCAAACCTCAAGTCCCACAAAAGTGATTGACCATTACACTGGAGATGGCTCTCCCAAGATGACGTCTNNNNNNNNNNAATGGATATGTTAAACATAGACCAAATAACTATAACCTCACAAAGAAATGTGTACATTATGTAGATCTTTCATGAACAAAAAGCAAAATAATACAGCCCGGAGACCGAAGCTCCG
Locus11: AATTCTTTTCACCACCCACAAACCATACCTTGATTTGTTGATTCAACTTGCAGGTATCTATGTTGGAGGAAGCAAGATTGTTCATTTCAGACCTGACCCANNNNNNNNNNGATTCGAGTACGGGGTCCGACCTTCACTCTTCCTAGCCAAAGTCAGAGGCGGCACATGCACCACCGCACCCTCTGACCCGCCCGAAACAGTCATCGACCGGTGGTATTTCGTTATAGTTTCCG
Locus12: AATTCATGATCGGTTCCTTTTTAAGTCACTTCTTATTCACATCATGTACAAAATGAGACCAGACCGATTGATCCGAGTGCCCATACAAAAGACGATTAATNNNNNNNNNNATACCAACTACTAACCTGTAAGATTGGTTCCATTGGGGATGCTCACCGTAGAATTGAGAAACCATGAGCAAACTTTCCGACGTCGGATTGTCGACAACCGG
Does anyone have any ideas of why this might be happening? Thank you for any input!