Samtools mpileup output: missing fields?
1
0
Entering edit mode
6.6 years ago
cjaln • 0

Hi, I ran the following code in samtools version 1.3.1 and I am confused by the output, leaving me a little worried that I have missing data. So here's hoping someone can prove me wrong or right!

----------
samtools mpileup -q 250 -Q 20 -f dmel-all-chromosome-r6.01.fasta -b Sample1Sample2.txt | gzip > Sample1Sample2.mpileup.gz
----------

I set -q 250 because the RNA aligner I used, STAR, sets mapping quality to 250 for unique alignments. The files listed in Sample1Sample2.txt were two bam files, one for each sample. The reason I am confused about the output is the presence of < and > symbols. Here's an example of a line of my mpileup:

2R      14442944        G       48      >><>>><<<<<<<<<><<<><<<<><<<<<>><>>><><<>>>>>><.        FFBFFFF<FFFFFFFFFFFFFFFFFFFFFFFBBFFFFFFFFFF<FFFF       
75      <<<><<><<<<<<<<<<<<<<<<<<<<<<<<<>>><<<<<<<<<>>>><<>>>>>>>>>><>><>><>><><>><  FFF<FFF<FFBBBBBBBFBBBFBFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFF<FBFFFFFFFFFFFF

Why do < and > appear to be in the fifth column with base calls (eg. ".") rather than in the sixth column as depicted in this article? I thought it might be something to do with symbolising spliced reads, as I am working with RNAseq...but I just don't know.

Thanks in advance, Chris

(Edit: sorry, the text editor is making the line of pileup look funny, ignore the br= and p= stuff)

Samtools RNAseq Pileup MPileup • 2.2k views
ADD COMMENT
1
Entering edit mode

visialize with IGV and check for insertion/ deletion at: 2R:14442944

ADD REPLY
3
Entering edit mode
6.6 years ago

The article you're looking at is an ancient web page discussing the old pileup format, which is similar to but a simplified version of mpileup format, which replaced the old format with the arrival of the samtools mpileup command quite some years ago.

Instead you should look at the (admittedly terse) description of mpileup format in the mpileup section of the samtools man page. This notes that one of the additions in mpileup format is that < and > indicate reference skips.

So this particular position is covered by an N CIGAR operator in practically all of your reads.

ADD COMMENT
0
Entering edit mode

Thanks a lot John :)

ADD REPLY

Login before adding your answer.

Traffic: 2927 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6