10X Genomics, Single Cell Multiome ATAC-seq and RNA-seq data
0
0
Entering edit mode
3.2 years ago
yzhao140 • 0

Hi, all! I did a project using this dataset https://support.10xgenomics.com/single-cell-multiome-atac-gex/datasets/1.0.0/human_brain_3k. You can download the data using this link https://cf.10xgenomics.com/samples/cell-arc/1.0.0/human_brain_3k/human_brain_3k_filtered_feature_bc_matrix.tar.gz. But when I analyze single cell ATAC-seq data, I found the max count can be 400. Why? The code I used is listed below.

import time
import numpy as np
import csv
import gzip
import os
import scipy.io
import codecs
import torch
from torch.utils.data import Dataset
from scipy.sparse import coo_matrix

path = r"F:\zym\filtered_feature_bc_matrix"

mat = scipy.io.mmread(os.path.join(path, "matrix.mtx.gz"))
mat = mat.todense()

features_path = os.path.join(path, "features.tsv.gz")
feature_ids = [row[0] for row in csv.reader(codecs.iterdecode(gzip.open(features_path), 'utf-8'), delimiter="\t")]
gene_names = [row[1] for row in csv.reader(codecs.iterdecode(gzip.open(features_path), 'utf-8'), delimiter="\t")]
feature_types = [row[2] for row in csv.reader(codecs.iterdecode(gzip.open(features_path), 'utf-8'), delimiter="\t")]

barcodes_path = os.path.join(path, "barcodes.tsv.gz")
barcodes = [row[0] for row in csv.reader(codecs.iterdecode(gzip.open(barcodes_path), 'utf-8'), delimiter="\t")]

#36601
feature_types.count('Gene Expression')

X = mat[:36601,]
Y = mat[36601:,]

Y.max()
ATAC-seq • 1.2k views
ADD COMMENT

Login before adding your answer.

Traffic: 1789 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6