Samtools mpileup - Sudden decrease of coverage at homopolymer regions
1
0
Entering edit mode
2.5 years ago
Louis Kok ▴ 20

When generating output of samtools mpileup, I notice there is sudden decrease of coverage at the homopolymer region and went up again at position 250 to 261. I have check with IGV and confirmed that there is no gap /insertion happening around the region. My command is as below:

samtools mpileup -BQ 0 -d 1000000 -a -f ref.fasta foo.bam > foo.mpileup


The output was parsed and the counts were generated as below. The issue to investigate is around position 250 to 261.

**Ref   Pos Base    A_Count C_Count G_Count T_Count Total_Count**
Sequence    248 C   11  7821    2   0   7834
Sequence    249 T   83  3   4   7112    7202
Sequence    250 C   38  3452    16  10  3516
Sequence    251 A   1354    0   0   1   1355
Sequence    252 A   1283    0   1   0   1284
Sequence    253 A   1239    0   0   0   1239
Sequence    254 A   1154    1   0   0   1155
Sequence    255 A   1085    0   1   0   1086
Sequence    256 A   1054    6   14  1   1075
Sequence    257 A   1131    0   3   1   1135
Sequence    258 A   1209    0   0   2   1211
Sequence    259 A   1351    5   0   0   1356
Sequence    260 A   1526    0   1   0   1527
Sequence    261 A   1567    0   2   0   1569
Sequence    262 G   3   33  7399    6   7441
Sequence    263 A   7955    2   2   1   7960
Sequence    264 A   8043    0   4   0   8047
Sequence    265 A   8014    0   13  0   8027


I was wondering what filtering options of mpileup could have resulted in less count for homopolymer region. Thanks.

samtools mapping mpileup • 639 views
0
Entering edit mode
2.5 years ago

It is likely not an issue with SAMtools and is instead reflective of the alignment process - it is more difficult to correctly align reads around homopolymers. If reads are incorrectly aligned, false-positive variant calls can then ensue. Homopolymers are also an issue for the sequencing process itself, i.e., before any in silico analyses begin.

You could use, e.g., BEDTools to output all reads (irrespective of SAM flags) that correspond to these positions just to get corroborating evidence from the BAM and region.

Note that you should be using bcftools mpileup. samtools mpileup has been deprecated.