Hi all,
This may be a bit niche, but was hoping someone might have some guidance since I can't find any resources (will delete if not appropriate).
I've been trying to do scATACseq analysis using files generated from Biorad/ddseq standard pipeline. After processing the files per their standard workflow, I've only been successful in loading output files for monocole3/cicero into a cds object with count matrix, peak bed file, and barcodes.
I've been hoping to use ArchR, SnapATAC, Signac/Seurat etc. but these newer tools generally require fragments.tsv.gz file outputted by 10x cellranger-atac. In particular, I'm unsure how I can accurately quantify the duplicateCount column (indicated as PCR duplicates) or if there are reasonable work arounds.
I tried using sinto to generate this file with no luck. The bam file is formatted as such (combination of 4 samples from 2 conditions):
NB551136:29:HHKJ2BGX9:2:12310:8194:16161 99 hg19_chr10 60196 0 42M = 60328 172 CAAGGGATTGTCTTGGATTTTTCTGTTTCTCCCTCAATATCC EEEEEEEEEAEEEEEAEE<EAEAEEEEEEEEAAEAEEEEEEE NM:i:0 MD:Z:42 MC:Z:40M AS:i:42 XS:i:42 XA:Z:hg19_chr18,+14610,42M,0;hg19_chr2,+132562285,42M,1;hg19_chr14,-19357520,42M,1;hg19_chr22,+16469821,42M,1;hg19_chr1,-227726332,34M8S,0; XB:Z:TTGTAAGCGACACTTCAAGAC
NB551136:29:HHKJ2BGX9:3:13408:13804:3527 163 hg19_chr10 60294 0 40M = 60445 194 GTGTACAAAAGCCCCAAAGCATAATTTGTGCAGTTGAGCG AAAAAEEEEEEEEEEEEAEEEEEEEEEEEE<EEEAEAEEE NM:i:0 MD:Z:40 MC:Z:43M AS:i:40 XS:i:40 XA:Z:hg19_chr18,+14708,40M,0; XB:Z:TAGTGTTGAATCAACTAGAGC
Barcodes are formmated as such key/value relationship
TCGACAGGAATATGATAGGCA alignments.possorted.tagged_BC00001_N01
GTCCTTCCGAGTGGCCTCCTT alignments.possorted.tagged_BC00002_N02
CCAGTCATATGTGCCCTATGT alignments.possorted.tagged_BC00003_N02
Unfortunately, not sure how to proceed. Barcodes are kept after XB:Z: tag (specifying such for sinto unfortunately has not yielded results). Trying to generate fragments fails using various barcode arguments. Not sure if it's viable to re-format the bam, or I should re-process from fastq files (unfortunately cellranger-atac does not handle Biorad ddseq fastqs).
Fastq format is paired end, 4 lanes per sample, and 4 samples (2 conditions) that were merged into that bam file.
Thanks Jared. I had attempted to do this without success. Might be better to ask them if they have recommendations for this particular use case.
I see you did so here and got a useful response to get things working. Linking here so other users may find it as well.
Thanks Jared, was just about to do this so everyone could see.