Question: samtools sort truncated
gravatar for backmoons
5 months ago by
backmoons0 wrote:

Dear all,

I meet a problem when I analyzed the RNA-seq data using STAR-HESeq count pipeline, when I finished STAR, and got a bam file, I used "samtools sort" to sort the data, but I got an error:

**[E::bam_read1] CIGAR and query sequence lengths differ for GWNJ-0842:379:GW1810081505:6:2106:29579:24884
samtools sort: truncated file. Aborting**

I have 6 samples, 5 samples worked well with "samtools sort", only one this got an error, I tried to use different script parameters but still error, here is my script:

  1. genome generator

    STAR --runThreadN 16 --runMode genomeGenerate --genomeDir /home/xzm0017/Catfish/NS1809045_resequencing/clean_data/STAR3/star_index/ --genomeFastaFiles /home/xzm0017/Catfish/Channel_genome_transctipts_index/Channel_genome/0016606251ChannelCatfish_genome.fna --sjdbGTFfile /home/xzm0017/Catfish/Channel_genome_transctipts_index/Channel_genome/GCF_001660625.1_IpCoco_1.2_yulin_genomic.gtf --sjdbOverhang 149

  2. map

    STAR --runThreadN 16 --genomeDir /home/xzm0017/Catfish/NS1809045_resequencing/clean_data/STAR3/star_index/ --readFilesIn /home/xzm0017/Catfish/NS1809045_resequencing/clean_data/Chan7-1_R1_left_paired_trimmed.fq /home/xzm0017/Catfish/NS1809045_resequencing/clean_data/Chan7-1_R2_right_paired_trimmed.fq --outFileNamePrefix /home/xzm0017/Catfish/NS1809045_resequencing/clean_data/STAR2/chan71_2_mapped --limitOutSJcollapsed 5000000 --limitIObufferSize 300000000 --outSAMtype BAM Unsorted --limitBAMsortRAM 87162435271 --sjdbOverhang 149

  3. samtools sort

    samtools sort -n -T /home/xzm0017/Catfish/NS1809045_resequencing/clean_data/STAR2/tmp/ -o Catfish/NS1809045_resequencing/clean_data/STAR2/chan71_2_aftersort /home/xzm0017/Catfish/NS1809045_resequencing/clean_data/STAR2/chan71_2_mappedAligned.out.bam

It really bothered me for several days, If any of you could give me some suggestions to fix it, I will be really appreciated. Thanks in advance.

rna-seq • 369 views
ADD COMMENTlink modified 5 months ago by finswimmer11k • written 5 months ago by backmoons0

If you want the file sorted, why not tell STAR to sort it?

ADD REPLYlink written 5 months ago by swbarnes25.9k

Hi, Thanks for your kind reply. you mean "--outSAMtype BAM Unsorted "this parameter in STAR? yes, maybe I can( I will try it later). but I tried to don't use this parameter just now the output is sam file, then I use "samtools view" to convert the sam file to bam file, same error. What I afraid is that there is something wrong when mapping, so maybe it will still exist in next step if I don't figure out what's the problem..

ADD REPLYlink written 5 months ago by backmoons0
gravatar for Vitis
5 months ago by
New York
Vitis2.2k wrote:

Can you identify the offending read(s) with sequence length and CIGAR length? Also, please see this:

CIGAR and query sequence are of different length when trying to convert from sam to bam?

You may use Picard tools validateSAMfile to find the problematic reads. I think they may have come from some compatibility issues between your mapper and samtools.

ADD COMMENTlink modified 5 months ago • written 5 months ago by Vitis2.2k

Sorry I am very new to this, what do you mead by offending read(s)... :(

ADD REPLYlink written 5 months ago by backmoons0

The error message mentions that a read has inconsistent sequence length and CIGAR string. So if a read is 101bp long and all of it has been mapped to the genome, it will have a CIGAR string 101M, which means 101bp match. You can image different reads would have different CIGAR strings indicating their different mapping situations. So the read lengths have to be consistent with information stored in CIGAR. Basically, the error message tells you that there are reads with inconsistent length and CIGAR. You may need to identify those reads and see how those reads got mapped by the mapper (in your case STAR) and why the mapper generated such inconsistencies.

ADD REPLYlink modified 5 months ago • written 5 months ago by Vitis2.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1468 users visited in the last hour