Question: ERROR SAMTOOLS GENERATING TRUNCATED FILE: SEQ and QUAL are of different length
0
gravatar for meilinfer
8 weeks ago by
meilinfer10
meilinfer10 wrote:

I would like to convert a bowtie2 generated .sam file to .bam with samtools (v1.9) but I get truncated files with SEQ and QUAL of different lenghts. The error line SEQ and QUAL's are indeed of different lenght but I don't know how to fix this.Why am I getting this error?

$ samtools view -b hiPSC1.sam > hiPSC1.bam

[E::sam_parse1] SEQ and QUAL are of different length

[W::sam_read1] Parse error at line 8272508

[main_samview] truncated file

here's how my .sam file looks:

@SQ SN:chrY LN:57227415

SRR2097503.1.3  83  chrM    6842    1   51M =   6745    -148    TATCCCCACCGGCGTCAAAGTATTTAGCTGACTCGCCACACTCCACGGAAG >>?8/:AA6A@?0DB9DB<?B@A@<DDC?)1EEC=CA?A+D?DDD@;D??? AS:i:0  XS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:51 YS:i:0  YT:Z:CP

and here is the line giving me the error:

$ sed -n 8272508p hiPSC1.sam

SRR2097503.1.4850276    99  chrM    14650   34  51M =   14804   205 CAACAGAAACAAAGCATACATCATTATTCTCGCACGGACTACAACCACGAC CCCFFFFFHHHHHIJJJJJJJJJJJJJJJJJJJJJIGGDJIJJIGJHHHHHFFFFFCCC AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:51 YS:i:0  YT:Z:C
ADD COMMENTlink modified 8 weeks ago by Pierre Lindenbaum128k • written 8 weeks ago by meilinfer10
1

Please go back and check your original data file to see if the problem exists there as well.

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by genomax84k
1

The SAM file is corrupted. Maybe an issue during alignment, amybe out-of-memory at some point. I would realign everything from scratch. If it is only this one read then you can also remove it and see if that fixes it. I would still realign:

bowtie2 (...option) | samtools view -o out.bam

which will produce a BAM file right away without intermediate sam files. This is called a Unix pipe.

ADD REPLYlink written 8 weeks ago by ATpoint35k

Thanks AT point I used bowtie2 to align and use a prebuilt index I downloaded from Bowtie's webpage. I'll try realigning again and piping samtools

bowtie2 --very-sensitive --no-discordant --no-mixed -X 2000 -x grch38_1kgmaj -1 hiPSC1_1P.fastq -2 hiPSC1_2P.fastq -p 4 > hiPSC1.sam
ADD REPLYlink modified 8 weeks ago by genomax84k • written 8 weeks ago by meilinfer10

You could use a pipe as suggested by @ATPoint or use the correct option to create the SAM file instead of the redirect you used.

bowtie2 --very-sensitive --no-discordant --no-mixed -X 2000 -x grch38_1kgmaj -1 hiPSC1_1P.fastq -2 hiPSC1_2P.fastq -p 4 -S hiPSC1.sam
ADD REPLYlink written 8 weeks ago by genomax84k

I did not use the -S!! Thanks I'll give it a go and align again. Thanks.

ADD REPLYlink written 8 weeks ago by meilinfer10
1
gravatar for ATpoint
8 weeks ago by
ATpoint35k
Germany
ATpoint35k wrote:

Your command does not contain a pipe. That would be:

bowtie2 --very-sensitive --no-discordant --no-mixed -X 2000 -x grch38_1kgmaj -1 hiPSC1_1P.fastq -2 hiPSC1_2P.fastq -p 4 | samtools view -o hiPSC1.bam
ADD COMMENTlink written 8 weeks ago by ATpoint35k
1

It worked!! I checked the file with:

samtools quickcheck -v

and it was okay and was even able to sort the file.Thanks!

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by meilinfer10

Cool, glad to hear.

ADD REPLYlink written 8 weeks ago by ATpoint35k
0
gravatar for Pierre Lindenbaum
8 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum128k wrote:

It's a problem with your SAM, not with samtools. It looks like a problem in your upstream process and you should not trust this data.

You can always filter out the bad lines with awk....

awk -F '\t' '$0 ~ /^@/ || length($10)==length($11)' input.sam
ADD COMMENTlink modified 8 weeks ago • written 8 weeks ago by Pierre Lindenbaum128k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1831 users visited in the last hour