Samtools view error when converting sam to bam file using sc-ATAC-seq data from Cell Ranger pipeline
2
0
Entering edit mode
22 months ago

Hi! I am working on a single cell ATAC sequencing project and am having an issue using samtools to split a bam file based on my wild-type and knock-out. The data came from 10X sequencing and they used the Cell Ranger pipeline for analysis. One analysis that the Cell Ranger conducted was t-sne which generates clusters based on similarity. Because different types of cells were used in the ATAC pipeline, similar cell types grouped together regardless of wild-type or knockout and a bam file was produced for each cluster. I would like to split these bam files to look at the variation within the cluster. I used samtools to convert the bam file to sam, and then split the knockout and wild-type files based on a tag line in the file. Now when I try to convert the split sam files back to bam, I keep getting this error.

samtools view -bS marrow_Cluster1_KO.sam > marrow_Cluster1_KO.bam


[W::sam_read1] Parse error at line 1 [main_samview] truncated file.

When I look through the entire marrow_Cluster1_KO.sam file, it looks how it should. The head and tail or the file looks like this:

head -10 marrow_Cluster1_KO.sam
1112:@RG    ID:A2.07,P2.24,A1.03,P1.03  SM:Barcode00086
1113:@RG    ID:A2.08,P2.24,A1.10,P1.14  SM:Barcode00152
1114:@RG    ID:A2.08,P2.16,A1.03,P1.08  SM:Barcode00191
1115:@RG    ID:A2.08,P2.15,A1.09,P1.06  SM:Barcode00199
1116:@RG    ID:A2.08,P2.09,A1.03,P1.24  SM:Barcode00248

tail -10 marrow_Cluster1_KO.sam
678439:NB551608:11:HVFMVBGX7:4:23502:4860:495   83  chr9    56881073    42  47M =   56881041    -79 AATCGCTTCCTTCGCGCTTCCGGGTTCCGCCTCGCTCAGAAACGGAC EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA MD:Z:47 XG:i:0  NM:i:0  XM:i:0  XN:i:0  XO:i:0  AS:i:0  YS:i:0  YT:Z:CP RG:Z:A2.07,P2.15,A1.10,P1.06    PG:Z:MarkDuplicates-6D71E14F
678440:NB551608:11:HVFMVBGX7:1:13105:7658:1885  99  chr9    56881081    42  47M =   56881248    214 CCTTCGCGCTTCCGGGTTCCGCCTCGCTCAGAAACGGACCGACAGAT


What can I do to fix this error?

sc-ATAC-seq samtools view samtools Cell Ranger • 601 views
2
Entering edit mode
22 months ago

What are the 1112:, 1113:, …, 678440: at the start of the lines of the file you've showed us?

They appear to be some kind of line numbers, and if they're really in the file then of course samtools will throw parse errors.

If they're really in the file, you should figure out what in your pipeline has erroneously put them there. Or you can remove them with e.g. sed,

sed 's/^[0-9]*://' marrow_Cluster1_KO.sam > marrow_Cluster1_KO.fixed.sam


and see whether samtools likes the resulting file better.

0
Entering edit mode

Hi John,

Thank you so much for the reply! I am new to samtools so I am not sure what the numbers before the @RG are, but I tried what you had suggested and got this error:

[E::sam_parse1] missing SAM header [W::sam_read1] Parse error at line 191 [main_samview] truncated file.

What do you recommend I do to proceed?

0
Entering edit mode

Well, what does line 191 look like?

0
Entering edit mode

1300:@RG ID:A2.08,P2.14,A1.04,P1.05 SM:Barcode05838

4099:@PG ID:bowtie2-E6859AC-EAA2107 PN:bowtie2 VN:2.2.5 CL:"/mnt/users/sai/miniconda2/bin/bowtie2-align-s --wrapper basic-0 -X2000 -p 18 --rg-id kidney_marrow_KO_gata2B -x /mnt/users/sai/Script/genomes/bowtie2/GRCz10/GRCz10 -1 /mnt/AlignedData/181118-jeff-zebrafish//fastqs/kidney_marrow_KO_gata2B.Sub.0009.All.1.R1.trim.fastq -2 /mnt/AlignedData/181118-jeff-zebrafish//fastqs/kidney_marrow_KO_gata2B.Sub.0009.All.1.R2.trim.fastq"

4401:NB551608:11:HVFMVBGX7:1:23302:16011:1070 99 chr1 6671 42 47M = 6693 69 CATCAGAGTTTAGCGTTTGCCACCGACGCGAGGAGCGCTGACCTTCA AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE MD:Z:47 XG:i:0 NM:i:0 XM:i:0 XN:i:0 XO:i:0 AS:i:0 YS:i:0 YT:Z:CP RG:Z:A2.07,P2.22,A1.03,P1.20 PG:Z:MarkDuplicates-62387EC5

4402:NB551608:11:HVFMVBGX7:1:23302:16011:1070 147 chr1 6693 42 47M = 6671 -69 CCGACGCGAGGAGCGCTGACCTTCATGGGCTTGGCAATCTTCTGTTT EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA MD:Z:47 XG:i:0 NM:i:0 XM:i:0 XN:i:0 XO:i:0 AS:i:0 YS:i:0 YT:Z:CP RG:Z:A2.07,P2.22,A1.03,P1.20 PG:Z:MarkDuplicates-62387EC5

191 begins with 4401. I think one of my issues is also that I need a SAM @HD header which I do not have, but I also do not know how to generate because the files are not being mapped to anything to create a header. Do you know how to assign a header to a bam file?

Thanks, Meera

0
Entering edit mode
22 months ago

Try

samtools view -bSh marrow_Cluster1_KO.sam > marrow_Cluster1_KO.bam


so it understands the first lines are header.

0
Entering edit mode

Hi! Thank you so much for the reply! I tried what you had suggested but got the same error:

[W::sam_read1] Parse error at line 1 [main_samview] truncated file.

Do you have any other recommendations?