samtools sort truncated
1
0
Entering edit mode
5.3 years ago
backmoons ▴ 10

Dear all,

I meet a problem when I analyzed the RNA-seq data using STAR-HESeq count pipeline, when I finished STAR, and got a bam file, I used "samtools sort" to sort the data, but I got an error:

**[E::bam_read1] CIGAR and query sequence lengths differ for GWNJ-0842:379:GW1810081505:6:2106:29579:24884
samtools sort: truncated file. Aborting**

I have 6 samples, 5 samples worked well with "samtools sort", only one this got an error, I tried to use different script parameters but still error, here is my script:

  1. genome generator

    STAR --runThreadN 16 --runMode genomeGenerate --genomeDir /home/xzm0017/Catfish/NS1809045_resequencing/clean_data/STAR3/star_index/ --genomeFastaFiles /home/xzm0017/Catfish/Channel_genome_transctipts_index/Channel_genome/0016606251ChannelCatfish_genome.fna --sjdbGTFfile /home/xzm0017/Catfish/Channel_genome_transctipts_index/Channel_genome/GCF_001660625.1_IpCoco_1.2_yulin_genomic.gtf --sjdbOverhang 149

  2. map

    STAR --runThreadN 16 --genomeDir /home/xzm0017/Catfish/NS1809045_resequencing/clean_data/STAR3/star_index/ --readFilesIn /home/xzm0017/Catfish/NS1809045_resequencing/clean_data/Chan7-1_R1_left_paired_trimmed.fq /home/xzm0017/Catfish/NS1809045_resequencing/clean_data/Chan7-1_R2_right_paired_trimmed.fq --outFileNamePrefix /home/xzm0017/Catfish/NS1809045_resequencing/clean_data/STAR2/chan71_2_mapped --limitOutSJcollapsed 5000000 --limitIObufferSize 300000000 --outSAMtype BAM Unsorted --limitBAMsortRAM 87162435271 --sjdbOverhang 149

  3. samtools sort

    samtools sort -n -T /home/xzm0017/Catfish/NS1809045_resequencing/clean_data/STAR2/tmp/ -o Catfish/NS1809045_resequencing/clean_data/STAR2/chan71_2_aftersort /home/xzm0017/Catfish/NS1809045_resequencing/clean_data/STAR2/chan71_2_mappedAligned.out.bam

It really bothered me for several days, If any of you could give me some suggestions to fix it, I will be really appreciated. Thanks in advance.

RNA-Seq • 2.9k views
ADD COMMENT
0
Entering edit mode

If you want the file sorted, why not tell STAR to sort it?

ADD REPLY
0
Entering edit mode

Hi, Thanks for your kind reply. you mean "--outSAMtype BAM Unsorted "this parameter in STAR? yes, maybe I can( I will try it later). but I tried to don't use this parameter just now the output is sam file, then I use "samtools view" to convert the sam file to bam file, same error. What I afraid is that there is something wrong when mapping, so maybe it will still exist in next step if I don't figure out what's the problem..

ADD REPLY
0
Entering edit mode
5.3 years ago
Vitis ★ 2.5k

Can you identify the offending read(s) with sequence length and CIGAR length? Also, please see this:

CIGAR and query sequence are of different length when trying to convert from sam to bam?

You may use Picard tools validateSAMfile to find the problematic reads. I think they may have come from some compatibility issues between your mapper and samtools.

ADD COMMENT
0
Entering edit mode

Sorry I am very new to this, what do you mead by offending read(s)... :(

ADD REPLY
0
Entering edit mode

The error message mentions that a read has inconsistent sequence length and CIGAR string. So if a read is 101bp long and all of it has been mapped to the genome, it will have a CIGAR string 101M, which means 101bp match. You can image different reads would have different CIGAR strings indicating their different mapping situations. So the read lengths have to be consistent with information stored in CIGAR. Basically, the error message tells you that there are reads with inconsistent length and CIGAR. You may need to identify those reads and see how those reads got mapped by the mapper (in your case STAR) and why the mapper generated such inconsistencies.

ADD REPLY

Login before adding your answer.

Traffic: 1759 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6