Question: samtools sort remapped bam by readname/query name
0
gravatar for quachtina96
4.1 years ago by
quachtina9640
United States
quachtina9640 wrote:

Hi,

I extracted mitochondrial reads from a bam file and remapped them to a reference genome through the following python function (remap2mt). my goal is to take this bam file (remapped) and find the depth of coverage for that bam file (samtools depth -r chrM remapped.bam) . This requires the bam file to be sorted, which is why I include that at the bottom of this function, however it returns the following error that prevents any depths from being outputted. Can anyone tell me why the error is occuring? I took a look a the header of the remapped bam and got

@SQ     SN:gi|251831106|ref|NC_012920.1|        LN:16569
@PG     ID:bwa  PN:bwa  VN:0.7.10-r789  CL:bwa mem rCRS.fasta ID18_Proband_exome_L001_001_mtExtract_1.fq ID18_Proband_exome_L001_001_mtExtract_2.fq

THE ERROR: (occurs when I call samtools depth on the  sorted remapped bam)

[bam_pileup_core] the input is not sorted (reads out of order)
[bam_plp_destroy] memory leak: 1. Continue anyway.

THE FUNCTION:

def remap2mt(bam,path2currentDirectory):
    #remap to the mtGenome (fasta file)
    newFastq1 = bam[:-9] + "_1.fq" #strings holding names of the fastq files
    newFastq2 = bam[:-9] + "_2.fq" 
    command = "samtools sort -n " + bam + " " +  bam[:-9]+ ".qsort" #bam must be sorted before conversion to fq
    print command
    job = shlex.split(command)
    stdout, stderr = sp.Popen(job,stdout=sp.PIPE).communicate() #execute sort command and send output to stdout
    
    command = "bedtools bamtofastq -i " + bam[:-9]+ ".qsort.bam" + " -fq " + newFastq1 + " -fq2 " + newFastq2  #convert from bam to fastq
    errFile = open('bamtofastqError', 'w+')
    stdout,stderr = sp.Popen(shlex.split(command),stdout=sp.PIPE, stderr=errFile).communicate()
    errFile.close()

    #remap the mtDNA extract to the reference genome
    command = "bwa mem rCRS.fasta " + newFastq1 +" " + newFastq2 
    remappedSam =str(bam[:-9] + "remap.sam")
    args = shlex.split(command)
    newPath = path2currentDirectory + "/" + remappedSam
    outfileHandle = open(newPath, 'w+')
    remapJob = sp.Popen(args, stdout = outfileHandle)
    stdout, stderr = remapJob.communicate()
    
    remappedBam = str(bam[:-9] + "remap.bam")
    command = "samtools view -bT rCRS.fasta " + remappedSam + " -o " + remappedBam
    args = shlex.split(command)
    job = sp.Popen(args, stdout = sp.PIPE, stderr = sp.PIPE)
    stdout, stderr = job.communicate()
    print "Remap executed"

    command = "samtools sort -n " + remappedBam + " " +  remappedBam[:-4]+ ".qsort" #resorting
    print command
    job = shlex.split(command)
    stdout, stderr = sp.Popen(job,stdout=sp.PIPE).communicate() #execute sort command and send output to stdout

    return remappedBam
sort samtools bam • 2.0k views
ADD COMMENTlink modified 4.1 years ago by Pierre Lindenbaum122k • written 4.1 years ago by quachtina9640
1
gravatar for Pierre Lindenbaum
4.1 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum122k wrote:

you sorted by "read name"  (samtools sort -n ) while mpileup expect the bam to be sorted on coordinate.

ADD COMMENTlink written 4.1 years ago by Pierre Lindenbaum122k

Hi Pierre!

I am using samtools depth, not mpileup. Does this change your answer?

ADD REPLYlink written 4.1 years ago by quachtina9640

it's the  same thing: both invoke the mpileup command https://github.com/samtools/samtools/blob/develop/bam2depth.c#L181

ADD REPLYlink written 4.1 years ago by Pierre Lindenbaum122k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1723 users visited in the last hour