Question: Samtools mpileup - Sudden decrease of coverage at homopolymer regions
0
gravatar for Louis Kok
5 months ago by
Louis Kok10
Singapore
Louis Kok10 wrote:

When generating output of samtools mpileup, I notice there is sudden decrease of coverage at the homopolymer region and went up again at position 250 to 261. I have check with IGV and confirmed that there is no gap /insertion happening around the region. My command is as below:

samtools mpileup -BQ 0 -d 1000000 -a -f ref.fasta foo.bam > foo.mpileup

The output was parsed and the counts were generated as below. The issue to investigate is around position 250 to 261.

**Ref   Pos Base    A_Count C_Count G_Count T_Count Total_Count**
Sequence    248 C   11  7821    2   0   7834
Sequence    249 T   83  3   4   7112    7202
Sequence    250 C   38  3452    16  10  3516
Sequence    251 A   1354    0   0   1   1355
Sequence    252 A   1283    0   1   0   1284
Sequence    253 A   1239    0   0   0   1239
Sequence    254 A   1154    1   0   0   1155
Sequence    255 A   1085    0   1   0   1086
Sequence    256 A   1054    6   14  1   1075
Sequence    257 A   1131    0   3   1   1135
Sequence    258 A   1209    0   0   2   1211
Sequence    259 A   1351    5   0   0   1356
Sequence    260 A   1526    0   1   0   1527
Sequence    261 A   1567    0   2   0   1569
Sequence    262 G   3   33  7399    6   7441
Sequence    263 A   7955    2   2   1   7960
Sequence    264 A   8043    0   4   0   8047
Sequence    265 A   8014    0   13  0   8027

I was wondering what filtering options of mpileup could have resulted in less count for homopolymer region. Thanks.

mpileup mapping samtools • 217 views
ADD COMMENTlink modified 5 months ago by Kevin Blighe49k • written 5 months ago by Louis Kok10
0
gravatar for Kevin Blighe
5 months ago by
Kevin Blighe49k
Kevin Blighe49k wrote:

It is likely not an issue with SAMtools and is instead reflective of the alignment process - it is more difficult to correctly align reads around homopolymers. If reads are incorrectly aligned, false-positive variant calls can then ensue. Homopolymers are also an issue for the sequencing process itself, i.e., before any in silico analyses begin.

You could use, e.g., BEDTools to output all reads (irrespective of SAM flags) that correspond to these positions just to get corroborating evidence from the BAM and region.

Note that you should be using bcftools mpileup. samtools mpileup has been deprecated.

ADD COMMENTlink modified 5 months ago • written 5 months ago by Kevin Blighe49k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1121 users visited in the last hour