I have mapped 250M illumina paired-end reads against a 5000bp long gene using BWA and created a consensus sequence of that gene using samtools mpileup. The average depth coverage is low 5X. When I look at .bam file using samtools tview, this is what I see:
341 351 361 371 381 391
What do 'N's stand for here? My reference gene? Or a sequence below is my reference gene?
Also, When samtools mpileup generate a consensus sequence, after looking carefully at my alignment in .bam file I noticed there are some regions that appear in my consensus sequence that has only 1 read for example and another region do not appear even if it also has only 1 read. Why is that? Does it depend on the quality of base/mapping reads? Also, is there a way to produce a consensus sequence even if there is only 1 read even if it is of low quality? May be there is a way to make a less stringent parameters for consensus. Any advice is appreciated. Also, ask me for any additional explanations if I was not clear enough.