samtools sort remapped bam by readname/query name
1
0
Entering edit mode
8.8 years ago
quachtina96 ▴ 40

Hi,

I extracted mitochondrial reads from a bam file and remapped them to a reference genome through the following python function (remap2mt). my goal is to take this bam file (remapped) and find the depth of coverage for that bam file (samtools depth -r chrM remapped.bam) . This requires the bam file to be sorted, which is why I include that at the bottom of this function, however it returns the following error that prevents any depths from being outputted. Can anyone tell me why the error is occurring? I took a look a the header of the remapped bam and got

@SQ     SN:gi|251831106|ref|NC_012920.1|        LN:16569
@PG     ID:bwa  PN:bwa  VN:0.7.10-r789  CL:bwa mem rCRS.fasta ID18_Proband_exome_L001_001_mtExtract_1.fq ID18_Proband_exome_L001_001_mtExtract_2.fq

THE ERROR: (occurs when I call samtools depth on the sorted remapped bam)

[bam_pileup_core] the input is not sorted (reads out of order)
[bam_plp_destroy] memory leak: 1. Continue anyway.

THE FUNCTION:

def remap2mt(bam,path2currentDirectory):
    #remap to the mtGenome (fasta file)
    newFastq1 = bam[:-9] + "_1.fq" #strings holding names of the fastq files
    newFastq2 = bam[:-9] + "_2.fq" 
    command = "samtools sort -n " + bam + " " +  bam[:-9]+ ".qsort" #bam must be sorted before conversion to fq
    print command
    job = shlex.split(command)
    stdout, stderr = sp.Popen(job,stdout=sp.PIPE).communicate() #execute sort command and send output to stdout

    command = "bedtools bamtofastq -i " + bam[:-9]+ ".qsort.bam" + " -fq " + newFastq1 + " -fq2 " + newFastq2  #convert from bam to fastq
    errFile = open('bamtofastqError', 'w+')
    stdout,stderr = sp.Popen(shlex.split(command),stdout=sp.PIPE, stderr=errFile).communicate()
    errFile.close()

    #remap the mtDNA extract to the reference genome
    command = "bwa mem rCRS.fasta " + newFastq1 +" " + newFastq2 
    remappedSam =str(bam[:-9] + "remap.sam")
    args = shlex.split(command)
    newPath = path2currentDirectory + "/" + remappedSam
    outfileHandle = open(newPath, 'w+')
    remapJob = sp.Popen(args, stdout = outfileHandle)
    stdout, stderr = remapJob.communicate()

    remappedBam = str(bam[:-9] + "remap.bam")
    command = "samtools view -bT rCRS.fasta " + remappedSam + " -o " + remappedBam
    args = shlex.split(command)
    job = sp.Popen(args, stdout = sp.PIPE, stderr = sp.PIPE)
    stdout, stderr = job.communicate()
    print "Remap executed"

   command = "samtools sort -n " + remappedBam + " " +  remappedBam[:-4]+ ".qsort" #resorting
    print command
    job = shlex.split(command)
    stdout, stderr = sp.Popen(job,stdout=sp.PIPE).communicate() #execute sort command and send output to stdout

    return remappedBam
sort bam samtools • 3.6k views
ADD COMMENT
2
Entering edit mode
8.8 years ago

You sorted by read name (samtools sort -n) while mpileup expect the bam to be sorted on coordinate.

ADD COMMENT
0
Entering edit mode

Hi Pierre!

I am using samtools depth, not mpileup. Does this change your answer?

ADD REPLY
0
Entering edit mode

It's the same thing: both invoke the mpileup command https://github.com/samtools/samtools/blob/develop/bam2depth.c#L181

ADD REPLY

Login before adding your answer.

Traffic: 1604 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6