Biorad/ddseq scATACseq fragements.tsv.gz integration
1
0
Entering edit mode
3.5 years ago

Hi all,

This may be a bit niche, but was hoping someone might have some guidance since I can't find any resources (will delete if not appropriate).

I've been trying to do scATACseq analysis using files generated from Biorad/ddseq standard pipeline. After processing the files per their standard workflow, I've only been successful in loading output files for monocole3/cicero into a cds object with count matrix, peak bed file, and barcodes.

I've been hoping to use ArchR, SnapATAC, Signac/Seurat etc. but these newer tools generally require fragments.tsv.gz file outputted by 10x cellranger-atac. In particular, I'm unsure how I can accurately quantify the duplicateCount column (indicated as PCR duplicates) or if there are reasonable work arounds.

I tried using sinto to generate this file with no luck. The bam file is formatted as such (combination of 4 samples from 2 conditions):

NB551136:29:HHKJ2BGX9:2:12310:8194:16161    99  hg19_chr10  60196   0   42M =   60328   172 CAAGGGATTGTCTTGGATTTTTCTGTTTCTCCCTCAATATCC  EEEEEEEEEAEEEEEAEE<EAEAEEEEEEEEAAEAEEEEEEE  NM:i:0  MD:Z:42 MC:Z:40M    AS:i:42 XS:i:42 XA:Z:hg19_chr18,+14610,42M,0;hg19_chr2,+132562285,42M,1;hg19_chr14,-19357520,42M,1;hg19_chr22,+16469821,42M,1;hg19_chr1,-227726332,34M8S,0; XB:Z:TTGTAAGCGACACTTCAAGAC
NB551136:29:HHKJ2BGX9:3:13408:13804:3527    163 hg19_chr10  60294   0   40M =   60445   194 GTGTACAAAAGCCCCAAAGCATAATTTGTGCAGTTGAGCG    AAAAAEEEEEEEEEEEEAEEEEEEEEEEEE<EEEAEAEEE    NM:i:0  MD:Z:40 MC:Z:43M    AS:i:40 XS:i:40 XA:Z:hg19_chr18,+14708,40M,0;   XB:Z:TAGTGTTGAATCAACTAGAGC

Barcodes are formmated as such key/value relationship

TCGACAGGAATATGATAGGCA   alignments.possorted.tagged_BC00001_N01
GTCCTTCCGAGTGGCCTCCTT   alignments.possorted.tagged_BC00002_N02
CCAGTCATATGTGCCCTATGT   alignments.possorted.tagged_BC00003_N02

Unfortunately, not sure how to proceed. Barcodes are kept after XB:Z: tag (specifying such for sinto unfortunately has not yielded results). Trying to generate fragments fails using various barcode arguments. Not sure if it's viable to re-format the bam, or I should re-process from fastq files (unfortunately cellranger-atac does not handle Biorad ddseq fastqs).

Fastq format is paired end, 4 lanes per sample, and 4 samples (2 conditions) that were merged into that bam file.

scATACseq ddseq • 1.4k views
ADD COMMENT
2
Entering edit mode
3.5 years ago

ArchR can use bam inputs, see the inputFiles argument for the createArrowFiles() function. If you still run into issues, the devs may be able to provide some guidance on their Github if you open an issue.

I also highly recommend it for your analysis, it is very well made, though the devs are still working on getting every feature documented.

ADD COMMENT
0
Entering edit mode

Thanks Jared. I had attempted to do this without success. Might be better to ask them if they have recommendations for this particular use case.

ADD REPLY
0
Entering edit mode

I see you did so here and got a useful response to get things working. Linking here so other users may find it as well.

ADD REPLY
0
Entering edit mode

Thanks Jared, was just about to do this so everyone could see.

ADD REPLY

Login before adding your answer.

Traffic: 1468 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6