Question: Biorad/ddseq scATACseq fragements.tsv.gz integration
gravatar for Ready2Rapture
4 weeks ago by
Ready2Rapture20 wrote:

Hi all,

This may be a bit niche, but was hoping someone might have some guidance since I can't find any resources (will delete if not appropriate).

I've been trying to do scATACseq analysis using files generated from Biorad/ddseq standard pipeline. After processing the files per their standard workflow, I've only been successful in loading output files for monocole3/cicero into a cds object with count matrix, peak bed file, and barcodes.

I've been hoping to use ArchR, SnapATAC, Signac/Seurat etc. but these newer tools generally require fragments.tsv.gz file outputted by 10x cellranger-atac. In particular, I'm unsure how I can accurately quantify the duplicateCount column (indicated as PCR duplicates) or if there are reasonable work arounds.

I tried using sinto to generate this file with no luck. The bam file is formatted as such (combination of 4 samples from 2 conditions):

NB551136:29:HHKJ2BGX9:2:12310:8194:16161    99  hg19_chr10  60196   0   42M =   60328   172 CAAGGGATTGTCTTGGATTTTTCTGTTTCTCCCTCAATATCC  EEEEEEEEEAEEEEEAEE<EAEAEEEEEEEEAAEAEEEEEEE  NM:i:0  MD:Z:42 MC:Z:40M    AS:i:42 XS:i:42 XA:Z:hg19_chr18,+14610,42M,0;hg19_chr2,+132562285,42M,1;hg19_chr14,-19357520,42M,1;hg19_chr22,+16469821,42M,1;hg19_chr1,-227726332,34M8S,0; XB:Z:TTGTAAGCGACACTTCAAGAC
NB551136:29:HHKJ2BGX9:3:13408:13804:3527    163 hg19_chr10  60294   0   40M =   60445   194 GTGTACAAAAGCCCCAAAGCATAATTTGTGCAGTTGAGCG    AAAAAEEEEEEEEEEEEAEEEEEEEEEEEE<EEEAEAEEE    NM:i:0  MD:Z:40 MC:Z:43M    AS:i:40 XS:i:40 XA:Z:hg19_chr18,+14708,40M,0;   XB:Z:TAGTGTTGAATCAACTAGAGC

Barcodes are formmated as such key/value relationship

TCGACAGGAATATGATAGGCA   alignments.possorted.tagged_BC00001_N01
GTCCTTCCGAGTGGCCTCCTT   alignments.possorted.tagged_BC00002_N02
CCAGTCATATGTGCCCTATGT   alignments.possorted.tagged_BC00003_N02

Unfortunately, not sure how to proceed. Barcodes are kept after XB:Z: tag (specifying such for sinto unfortunately has not yielded results). Trying to generate fragments fails using various barcode arguments. Not sure if it's viable to re-format the bam, or I should re-process from fastq files (unfortunately cellranger-atac does not handle Biorad ddseq fastqs).

Fastq format is paired end, 4 lanes per sample, and 4 samples (2 conditions) that were merged into that bam file.

ddseq scatacseq • 128 views
ADD COMMENTlink modified 4 weeks ago by jared.andrews077.9k • written 4 weeks ago by Ready2Rapture20
gravatar for jared.andrews07
4 weeks ago by
Memphis, TN
jared.andrews077.9k wrote:

ArchR can use bam inputs, see the inputFiles argument for the createArrowFiles() function. If you still run into issues, the devs may be able to provide some guidance on their Github if you open an issue.

I also highly recommend it for your analysis, it is very well made, though the devs are still working on getting every feature documented.

ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by jared.andrews077.9k

Thanks Jared. I had attempted to do this without success. Might be better to ask them if they have recommendations for this particular use case.

ADD REPLYlink written 4 weeks ago by Ready2Rapture20

I see you did so here and got a useful response to get things working. Linking here so other users may find it as well.

ADD REPLYlink written 29 days ago by jared.andrews077.9k

Thanks Jared, was just about to do this so everyone could see.

ADD REPLYlink written 22 days ago by Ready2Rapture20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1191 users visited in the last hour