Unable to get to BAM file from SAM using samtools
1
0
Entering edit mode
11 days ago

Greetings to all.

I have some samples from mosquito´s Aedes aegypit exome sequencing (Coding portion of DNA). I intend to look at SNPs in DNA's coding portion to associate it with a resistant phenotype. For this, I used gsnap to map due to being SNP tolerant.

My inquiry: I´m having trouble getting from SAM to BAM. And this is the error I got:

I used samtools with this code:

samtools view -b -o HLR1_subsample.bam HLR1_subsample.sam

I´ve repeated the mapping, I´ve redo the instalation of samtools, checked the version of samtools, checked the version of gsnap. My version of samtools is samtools 1.19.2, GSNAP version 2024-11-20 called with args: gsnap.avx512.

It keeps getting the error "positional data is too large for BAM format" as in the picture

Second time posting a question. Not quite sure what it is truly needed to understand my question.

Samtools BAM SAM • 765 views
ADD COMMENT
0
Entering edit mode

Can you share the output of samtools view -H for a file where the bam conversion worked and one where the command failed?

ADD REPLY
0
Entering edit mode

Sure thing:

Failed Failed

Worked Worked

Is this enough?

ADD REPLY
0
Entering edit mode

Please copy and paste text and then format (after selecting) as code using the 101010 button in edit window. Posting screenshots does not provide clear information about content.

Do you have large datasets as in number of reads. Searches seem to indicate that other issue may be too many reads aligning to specific positions in your BAM file.

ADD REPLY
0
Entering edit mode

Thank you for your suggestion. I paste the code as stated. I do not have a large data sets. Both of this data set have a size of 300000 reads.

#HPR1 -> worked

     (base) grupobcei@Artemisa:~/grupobcei/ewas/ewas_Acacias/03._ewas_mapped_subsample$ samtools view -H HPR1_subsample.sam

@HD     VN:1.0  SO:unsorted
@PG     ID:GSNAP        PN:gsnap        VN:2024-11-20   CL:gsnap.avx512 -d Aaegl5_index -D /mnt/disc2/grupobcei/ewas/index/Aaegl5_index/ --max-mismatches=0.10 -n 10 -Q --nofails -B 4 -t 30 --allow-pe-name-mismatch --format=sam HPR1_R1_subsample.fastq HPR1_R2_subsample.fastq
@PG     ID:samtools     PN:samtools     PP:GSNAP        VN:1.19.2       CL:samtools view -H HPR1_subsample.sam
@SQ     SN:AaegL5_1     LN:310827022
@SQ     SN:AaegL5_2     LN:474425716
@SQ     SN:AaegL5_3     LN:409777670
@SQ     SN:AaegL5_MT    LN:16790
@SQ     SN:NIGP01000004 LN:30735

#HLR1 -> Failed

     (base) grupobcei@Artemisa:~/grupobcei/ewas/ewas_Acacias/03._ewas_mapped_subsample$ samtools view -H HLR1_subsample.sam
@HD     VN:1.0  SO:unsorted
@PG     ID:GSNAP        PN:gsnap        VN:2024-11-20   CL:gsnap.avx512 -d Aaegl5_index -D /mnt/disc2/grupobcei/ewas/index/Aaegl5_index/ --max-mismatches=0.10 -n 10 -Q --nofails -B 4 -t 30 --allow-pe-name-mismatch --format=sam HLR1_R1_subsample.fastq HLR1_R2_subsample.fastq
@PG     ID:samtools     PN:samtools     PP:GSNAP        VN:1.19.2       CL:samtools view -H HLR1_subsample.sam
@SQ     SN:AaegL5_1     LN:310827022
@SQ     SN:AaegL5_2     LN:474425716
@SQ     SN:AaegL5_3     LN:409777670
@SQ     SN:AaegL5_MT    LN:16790
@SQ     SN:NIGP01000004 LN:30735
ADD REPLY
0
Entering edit mode

Do the files that fail always fail or work if resubmitted again?

Why are you using this option? If your paired-end reads are out of sync your alignments are going to be incorrect.

--allow-pe-name-mismatch

ADD REPLY
0
Entering edit mode

Yes. They fail or work when resubmitted consistently.

We allowed certain level of mismatch with this option. However, I´ve re run the alignments withouth that option and they kept failing to change from sam to bam.

    (base) grupobcei@Artemisa:~/grupobcei/ewas/ewas_Acacias/02.1_ewas_trimmed_subsample$ samtools view -H HLR1_wo_mismatch.sam                               samtools: /home/administrador/programas/miniconda3/bin/../lib/libtinfow.so.6: no version information available (required by samtools)
samtools: /home/administrador/programas/miniconda3/bin/../lib/libncursesw.so.6: no version information available (required by samtools)
samtools: /home/administrador/programas/miniconda3/bin/../lib/libncursesw.so.6: no version information available (required by samtools)
@HD     VN:1.0  SO:unsorted
@PG     ID:GSNAP        PN:gsnap        VN:2024-11-20   CL:gsnap.avx512 -d Aaegl5_index -D /mnt/disc2/grupobcei/ewas/index/Aaegl5_index/ --max-mismatches=0.10 -n 10 -Q --nofails -B 4 -t 30 --format=sam HLR1_R1_subsample.fastq HLR1_R2_subsample.fastq
@PG     ID:samtools     PN:samtools     PP:GSNAP        VN:1.19.2       CL:samtools view -H HLR1_wo_mismatch.sam
@SQ     SN:AaegL5_1     LN:310827022
@SQ     SN:AaegL5_2     LN:474425716
@SQ     SN:AaegL5_3     LN:409777670
@SQ     SN:AaegL5_MT    LN:16790
@SQ     SN:NIGP01000004 LN:30735
@SQ     SN:NIGP01000005 LN:39846
@SQ     SN:NIGP01000006 LN:33725


    (base) grupobcei@Artemisa:~/grupobcei/ewas/ewas_Acacias/02.1_ewas_trimmed_subsample$ samtools view -b -o HLR1_wo_mismatch.bam HLR1_wo_mismatch.sam
samtools: /home/administrador/programas/miniconda3/bin/../lib/libtinfow.so.6: no version information available (required by samtools)
samtools: /home/administrador/programas/miniconda3/bin/../lib/libncursesw.so.6: no version information available (required by samtools)
samtools: /home/administrador/programas/miniconda3/bin/../lib/libncursesw.so.6: no version information available (required by samtools)
[E::bam_write1] Positional data is too large for BAM format
samtools view: writing to "HLR1_wo_mismatch.bam" failed
ADD REPLY
0
Entering edit mode

However, I´ve re run the alignments withouth that option and they kept failing to change from sam to bam.

Did you scan/trim your paired end data files independently? If you did then the reads are likely out of sync. If that is the case you need to go back to the original files and scan/trim the paired end files together to make sure that does not happen.

Results you have from out of sync paired-end data are going to have discordant alignments for part of the data.

ADD REPLY
0
Entering edit mode

I´ve had run cutadapt

cutadapt -a AGATCGGAAGAGC -A AGATCGGAAGAGC -g GCTCTTCCGATCT -G GCTCTTCCGATCT -a AGATGTGTATAAGAGACAG -A AGATGTGTATAAGAGACAG -g CTGTCTCTTATACACATCT -G CTGTCTCTTATACACATCT -q 30,30 --minimum-length 80 -o HLR1_R1.fastq -p HLR1_R2.fastq HLR1_R1_001.fastq.gz HLR1_R2_001.fastq.gz

Using this code, so they shouldn´t be out of sync.

ADD REPLY
0
Entering edit mode

-o HLR1_R1.fastq -p HLR1_R2.fastq HLR1_R1_001.fastq.gz HLR1_R2_001.fastq.gz

This does not appear to be correct. Are you able to provide multiple input files at the same to to cutadapt (will have to look that up). In any case you have 3 files which is not balanced.

Edit: You can't directly provide multiple files of pairs as input to cutadapt. -o also specifies output. So It looks like if you ran the command line exactly as above then the results are not correct.</s>

ADD REPLY
0
Entering edit mode

I´ve actually done this with other samples and with some of them the actual SAM to BAM step can be done. Additionally, I looked for it in the manual, and in the option paired adapters cutadapt manual - pair adapters it is correct to use -o and -p to produce an output of paired reads.

So I´m still clueless about this problem.

ADD REPLY
0
Entering edit mode

That is a confusing way to provide input options (am not a cutadapt user) but you are right. I will scratch my comment above.

At this point nothing else that comes to mind then. If you are willing you could instead try fastp or bbduk.sh from BBMap suite as an alternate option on one of the sample file pair that is failing and see if it helps.

ADD REPLY
0
Entering edit mode

I´m asking for help to other professors at my University. Will reply if I find some way around it. Thank you for your help.

ADD REPLY
1
Entering edit mode
11 days ago
ATpoint 86k

Please get used to googling the main error message: https://github.com/samtools/samtools/issues/1669

ADD COMMENT
0
Entering edit mode

Ok. I understand the error. However, some of my samples did pass from SAM to BAM, how is this possible? I mean, all the chromosome have the same size in the reference genome, how can it be different when I sequenced them? I think I´m missing the point here.

Thanks

ADD REPLY
0
Entering edit mode

I mean, all the chromosome have the same size in the reference genome,

The chromosomes are not the same size (see https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_002204515.2/ ) . If you have them as same size you did something incorrect. They are not over 2 Gbp each so not sure why you are getting that error.

aedes

ADD REPLY
0
Entering edit mode

I think OP means that they align different samples to always the same reference and some pass while others don't, hence the conflict is why sometimes the reference appears too large and sometimes it is not. I have no answer unfortuinately.

ADD REPLY
0
Entering edit mode

Exactly. Sorry if it is not clear enough.

ADD REPLY

Login before adding your answer.

Traffic: 2377 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6