Question: Unusual samtools mpileup output?
0
gravatar for quachtina96
5.5 years ago by
quachtina9640
United States
quachtina9640 wrote:

Hi,

I am using samtools mpileup to generate a pileup file, and my output doesn't match the example pileup files I've seen (e.g. the example on  the pileup format article on Wikipedia). Do you agree that there seems to be something wrong with my file? If so, any suggestions on how to fix it?

The command I used:

samtools mpileup -B -f chrRCRS.fa input-sorted.bam > input.pileup

First part of the pileup file.

chrRCRS    1    N    22    ^]G^]G^]G^]G^]G^]G^]G^]G^]G^]G^]G^]G^]G^]G^]G^]g^]g^]g^]g^]g^]g^]g    IIBFFFIIIIIFFFIBIIIFII
chrRCRS    2    N    22    AAAAAAAAAAAAAAAaaaaaaa    FIBFFFIIIIIFFFIBIIIFII
chrRCRS    3    N    22    TTTTTTTTTTTTTTTttttttt    IIFFFFIIIIIFFFIFIIIFII
chrRCRS    4    N    22    CCCCCCCCCCCCCCCccccccc    IIFBFFIIIIIFFFIFFIIFII
chrRCRS    5    N    23    AAAAAAAAAAAAAAAaaaaaaa^]A    IIFFFIIIIIIFFFIBIIIFFIB
chrRCRS    6    N    24    CCCCCCCCCCCCCCCcccccccC^]c    IIFBFIIIIIIFBFIFIIIFIIBB
chrRCRS    7    N    25    AAAAAAAAAAAAAAAaaaaaaaAa^]a    FIF7FIIIIIIFFFIFIIIFIIBFF
chrRCRS    8    N    25    GGGGGGGGGGGGGGGgggggggGgg    FIF0FIIIFIIIFFIFIIIFIIFBB
chrRCRS    9    N    25    GGGGGGGGGGGGGGGgggggggGgg    FIFBFIIIIIIIFFIFIIIFIIFBB
chrRCRS    10    N    26    TTTTTTTTTTTTTTTtttttttTtt^]T    FFF0FFFFFFFIBFFFIIIFIIFBBB
chrRCRS    11    N    26    CCCCCCCCCCCCCCCcccccccCccC    FIFBIIFFIIFIFIIFIIIFIIFBBB
chrRCRS    12    N    27    TTTTTTTTTTTTTTTtttttttTttT^]T    IIF<IIIIIIIIFIIFFIIFIIFBFBB
chrRCRS    13    N    27    AAAAAAAAAAAAAAAaaaaaaaAaaAA    FIIBIIIIFIIIIIIBIIFFIIFBBBB
chrRCRS    14    N    27    TTTTTTTTTTTTTTTtttttttTttTT    IIIBIIIIIIIIFII7FIFBIIFB<FB
chrRCRS    15    N    27    CCCCCCCCCCCCCCCcccccccCccCC    IIIBIIIIIIIIFIIFFFFBFFFB<FF
chrRCRS    16    N    27    AAAAAAAAAAAAAAAaaaaaaaAaaAA    IIIBIIIIIIIIFII<BBBBBFF<7FF
chrRCRS    17    N    27    CCCCCCCCCCCCCCCcccccccCccCC    IIIBIIIIIIIIFIIFIIIFBIF7FFF
chrRCRS    18    N    28    CCCCCCCCCCCCCCCcccccccCccCC^]C    IIIBIIIIIIIIFIIFIIIF0IIFFBFB

 

 

 

ADD COMMENTlink modified 5.4 years ago by Biostar ♦♦ 20 • written 5.5 years ago by quachtina9640

I am not sure if it is a problem or unknown bases in the reference genome. The reference nucleotide at all positions (third column) that you have listed above is "N". As a result, bases at those positions from aligned reads show mismatch. For example, for position 2, aligned reads have "A" nucleotide. "A" represents mismatch on the forward strand and "a" represents mismatch on the reverse strand.  If you keep scrolling down in the mpileup file, you may find non N's in the reference genome and lots of "." and ","  for those positions provided that the non-reference sample has no variants for those positions. In case, all you see in the third column is "N"s then there may be some compatibility issue between fasta file and bam file. Try:

1) First checking the size of the .fai file. Do you have properly indexed fasta file along with the fasta file in the same folder. What is the size of the .fai file? 2) Check the chromosome names in the sam header and the reference fasta. Are they same?

ADD REPLYlink modified 5.5 years ago • written 5.5 years ago by Ashutosh Pandey12k

1) The fai file is one kb and it only contains

chrRCRS    16569    9    70    71

2) If I'm interpreting this correctly, the chromosome name is the same in the header (See below)

@HD    VN:1.4    GO:none    SO:coordinate
@SQ    SN:chrRCRS    LN:16569
@RG    ID:ID18_Father    PL:ILLUMINA    PU:L00    LB:ID18_Father    SM:ID18_Father_L00
@PG    ID:bwa    PN:bwa    VN:0.7.10-r789    CL:bwa mem chrRCRS.fa ID18_Father_exome_mtExtract_1.fq ID18_Father_exome_mtExtract_2.fq
@PG    ID:GATK IndelRealigner    CL:knownAlleles=[(RodBinding name=knownAlleles source=/gpfs/home/quacht/MToolBox/data/MITOMAP_HMTDB_known_indels_RCRS.vcf)] targetIntervals=/gpfs/home/quacht/MToolBox/data/intervals_file_RCRS.list LODThresholdForCleaning=5.0 consensusDeterminationModel=USE_READS entropyThreshold=0.15 maxReadsInMemory=150000 maxIsizeForMovement=3000 maxPositionalMoveAllowed=200 maxConsensuses=30 maxReadsForConsensuses=120 maxReadsForRealignment=20000 noOriginalAlignmentTags=false nWayOut=null generate_nWayOut_md5s=false check_early=false noPGTag=false keepPGTags=false indelsFileForDebugging=null statisticsFileForDebugging=null SNPsFileForDebugging=null
@PG    ID:MarkDuplicates    PN:MarkDuplicates    VN:1.92(1464)    CL:net.sf.picard.sam.MarkDuplicates INPUT=[ID18_Father.rg.ra.bam] OUTPUT=ID18_Father.rg.ra.marked.bam METRICS_FILE=ID18_Father-metrics.txt REMOVE_DUPLICATES=true ASSUME_SORTED=true TMP_DIR=[/gpfs/home/quacht/test_myMtoolbox_ID18Father/tmp]    PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false

 

ADD REPLYlink written 5.5 years ago by quachtina9640
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2779 users visited in the last hour
_