Trying to concat a list of 12 AnnData objects but am getting duplicates
1
0
Entering edit mode
17 months ago
Arta • 0

I'm prepping our experimental data for trajectory analysis by partially following this guide here and I have a list of 12 AnnData objects that I read in as loom files. 6 of them come from one sequencing run whereas the other 6 come from another. I followed the aforementioned link's recommendation to generated spliced/unspliced count matrices using velocyto, which is how I got the loom files.

Anyway, that's all background information. I'm trying to merge all of these AnnData objects into one.

>>> loom_data
[AnnData object with n_obs × n_vars = 5000 × 36601
    obs: 'comp.ident'
    var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
    layers: 'ambiguous', 'matrix', 'spliced', 'unspliced', AnnData object with n_obs × n_vars = 5773 × 36601
    obs: 'comp.ident'
    var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
    layers: 'ambiguous', 'matrix', 'spliced', 'unspliced', AnnData object with n_obs × n_vars = 6807 × 36601
    obs: 'comp.ident'
    var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
    layers: 'ambiguous', 'matrix', 'spliced', 'unspliced', AnnData object with n_obs × n_vars = 5613 × 36601
    obs: 'comp.ident'
    var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
    layers: 'ambiguous', 'matrix', 'spliced', 'unspliced', AnnData object with n_obs × n_vars = 6052 × 36601
    obs: 'comp.ident'
    var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
    layers: 'ambiguous', 'matrix', 'spliced', 'unspliced', AnnData object with n_obs × n_vars = 3500 × 36601
    obs: 'comp.ident'
    var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
    layers: 'ambiguous', 'matrix', 'spliced', 'unspliced', AnnData object with n_obs × n_vars = 10510 × 36601
    obs: 'comp.ident'
    var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
    layers: 'ambiguous', 'matrix', 'spliced', 'unspliced', AnnData object with n_obs × n_vars = 9356 × 36601
    obs: 'comp.ident'
    var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
    layers: 'ambiguous', 'matrix', 'spliced', 'unspliced', AnnData object with n_obs × n_vars = 3246 × 36601
    obs: 'comp.ident'
    var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
    layers: 'ambiguous', 'matrix', 'spliced', 'unspliced', AnnData object with n_obs × n_vars = 1132 × 36601
    obs: 'comp.ident'
    var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
    layers: 'ambiguous', 'matrix', 'spliced', 'unspliced', AnnData object with n_obs × n_vars = 13595 × 36601
    obs: 'comp.ident'
    var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
    layers: 'ambiguous', 'matrix', 'spliced', 'unspliced', AnnData object with n_obs × n_vars = 9541 × 36601
    obs: 'comp.ident'
    var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
    layers: 'ambiguous', 'matrix', 'spliced', 'unspliced']

I'm having trouble understanding how to do this. All of the barcodes, which are loom_data[i].obs.index, are unique and contain a suffix for the sample they correspond to. Ultimately, I want to bring in these layers into another AnnData object using scVelo.

The issue is calling sc.concat. It's the genes that have overlap; none of the barcodes are supposed to match across the 12 list elements. So I want to select the vars axis, which I think is 1, and I want the union of all of the elements in the other axis:

test = sc.concat(loom_data, axis = 1, join = 'outer')

But when I call the line above, I get a concatenation of all of the genes with the names made unique, even though I want to consolidate their counts:

>>> test
AnnData object with n_obs × n_vars = 80125 × 439212
    var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
    layers: 'ambiguous', 'matrix', 'spliced', 'unspliced'

I want the genes that aren't present in all samples to just have 0 counts. How would this be possible?

scanpy python anndata • 1.4k views
ADD COMMENT
1
Entering edit mode
17 months ago
Arta • 0

I just made the var names unique first and then concatenated along the obs instead of vars.

# for each element in loom_data, make var names unique
for ldata in loom_data:
    ldata.var_names_make_unique()

test = sc.concat(loom_data, axis = 0, join = 'outer')
>>> test
AnnData object with n_obs × n_vars = 80125 × 36601
    obs: 'comp.ident'
    layers: 'ambiguous', 'matrix', 'spliced', 'unspliced'

~36k genes is about what I expected.

ADD COMMENT

Login before adding your answer.

Traffic: 1345 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6