Question: Samtools view error when converting sam to bam file using sc-ATAC-seq data from Cell Ranger pipeline
0
gravatar for meerapprasad
29 days ago by
meerapprasad10
meerapprasad10 wrote:

Hi! I am working on a single cell ATAC sequencing project and am having an issue using samtools to split a bam file based on my wild-type and knock-out. The data came from 10X sequencing and they used the Cell Ranger pipeline for analysis. One analysis that the Cell Ranger conducted was t-sne which generates clusters based on similarity. Because different types of cells were used in the ATAC pipeline, similar cell types grouped together regardless of wild-type or knockout and a bam file was produced for each cluster. I would like to split these bam files to look at the variation within the cluster. I used samtools to convert the bam file to sam, and then split the knockout and wild-type files based on a tag line in the file. Now when I try to convert the split sam files back to bam, I keep getting this error.

samtools view -bS marrow_Cluster1_KO.sam > marrow_Cluster1_KO.bam

[W::sam_read1] Parse error at line 1 [main_samview] truncated file.

When I look through the entire marrow_Cluster1_KO.sam file, it looks how it should. The head and tail or the file looks like this:

head -10 marrow_Cluster1_KO.sam
1112:@RG    ID:A2.07,P2.24,A1.03,P1.03  SM:Barcode00086
1113:@RG    ID:A2.08,P2.24,A1.10,P1.14  SM:Barcode00152
1114:@RG    ID:A2.08,P2.16,A1.03,P1.08  SM:Barcode00191
1115:@RG    ID:A2.08,P2.15,A1.09,P1.06  SM:Barcode00199
1116:@RG    ID:A2.08,P2.09,A1.03,P1.24  SM:Barcode00248


tail -10 marrow_Cluster1_KO.sam
678439:NB551608:11:HVFMVBGX7:4:23502:4860:495   83  chr9    56881073    42  47M =   56881041    -79 AATCGCTTCCTTCGCGCTTCCGGGTTCCGCCTCGCTCAGAAACGGAC EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA MD:Z:47 XG:i:0  NM:i:0  XM:i:0  XN:i:0  XO:i:0  AS:i:0  YS:i:0  YT:Z:CP RG:Z:A2.07,P2.15,A1.10,P1.06    PG:Z:MarkDuplicates-6D71E14F
678440:NB551608:11:HVFMVBGX7:1:13105:7658:1885  99  chr9    56881081    42  47M =   56881248    214 CCTTCGCGCTTCCGGGTTCCGCCTCGCTCAGAAACGGACCGACAGAT

What can I do to fix this error?

ADD COMMENTlink modified 29 days ago by John Marshall1.7k • written 29 days ago by meerapprasad10
2
gravatar for John Marshall
29 days ago by
John Marshall1.7k
Glasgow, Scotland
John Marshall1.7k wrote:

What are the 1112:, 1113:, …, 678440: at the start of the lines of the file you've showed us?

They appear to be some kind of line numbers, and if they're really in the file then of course samtools will throw parse errors.

If they're really in the file, you should figure out what in your pipeline has erroneously put them there. Or you can remove them with e.g. sed,

sed 's/^[0-9]*://' marrow_Cluster1_KO.sam > marrow_Cluster1_KO.fixed.sam

and see whether samtools likes the resulting file better.

ADD COMMENTlink written 29 days ago by John Marshall1.7k

Hi John,

Thank you so much for the reply! I am new to samtools so I am not sure what the numbers before the @RG are, but I tried what you had suggested and got this error:

[E::sam_parse1] missing SAM header [W::sam_read1] Parse error at line 191 [main_samview] truncated file.

What do you recommend I do to proceed?

ADD REPLYlink written 29 days ago by meerapprasad10

Well, what does line 191 look like?

ADD REPLYlink written 29 days ago by swbarnes26.5k

1300:@RG ID:A2.08,P2.14,A1.04,P1.05 SM:Barcode05838

4099:@PG ID:bowtie2-E6859AC-EAA2107 PN:bowtie2 VN:2.2.5 CL:"/mnt/users/sai/miniconda2/bin/bowtie2-align-s --wrapper basic-0 -X2000 -p 18 --rg-id kidney_marrow_KO_gata2B -x /mnt/users/sai/Script/genomes/bowtie2/GRCz10/GRCz10 -1 /mnt/AlignedData/181118-jeff-zebrafish//fastqs/kidney_marrow_KO_gata2B.Sub.0009.All.1.R1.trim.fastq -2 /mnt/AlignedData/181118-jeff-zebrafish//fastqs/kidney_marrow_KO_gata2B.Sub.0009.All.1.R2.trim.fastq"

4401:NB551608:11:HVFMVBGX7:1:23302:16011:1070 99 chr1 6671 42 47M = 6693 69 CATCAGAGTTTAGCGTTTGCCACCGACGCGAGGAGCGCTGACCTTCA AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE MD:Z:47 XG:i:0 NM:i:0 XM:i:0 XN:i:0 XO:i:0 AS:i:0 YS:i:0 YT:Z:CP RG:Z:A2.07,P2.22,A1.03,P1.20 PG:Z:MarkDuplicates-62387EC5

4402:NB551608:11:HVFMVBGX7:1:23302:16011:1070 147 chr1 6693 42 47M = 6671 -69 CCGACGCGAGGAGCGCTGACCTTCATGGGCTTGGCAATCTTCTGTTT EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA MD:Z:47 XG:i:0 NM:i:0 XM:i:0 XN:i:0 XO:i:0 AS:i:0 YS:i:0 YT:Z:CP RG:Z:A2.07,P2.22,A1.03,P1.20 PG:Z:MarkDuplicates-62387EC5

191 begins with 4401. I think one of my issues is also that I need a SAM @HD header which I do not have, but I also do not know how to generate because the files are not being mapped to anything to create a header. Do you know how to assign a header to a bam file?

Thanks, Meera

ADD REPLYlink written 29 days ago by meerapprasad10
0
gravatar for swbarnes2
29 days ago by
swbarnes26.5k
United States
swbarnes26.5k wrote:

Try

samtools view -bSh marrow_Cluster1_KO.sam > marrow_Cluster1_KO.bam

so it understands the first lines are header.

ADD COMMENTlink written 29 days ago by swbarnes26.5k

Hi! Thank you so much for the reply! I tried what you had suggested but got the same error:

[W::sam_read1] Parse error at line 1 [main_samview] truncated file.

Do you have any other recommendations?

ADD REPLYlink written 29 days ago by meerapprasad10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 766 users visited in the last hour