Question: scATAC-seq analysis, data preprocessing
0
gravatar for chipolino
9 days ago by
chipolino30
chipolino30 wrote:

Hi,

During scATAC-seq data preprocessing, does it make sense to filter data matrix, so it contains only most variable peaks (in the same way how we do it for scRNA-seq), before any further dimensionality reduction or clustering analysis?

Thanks

scatac-seq • 82 views
ADD COMMENTlink modified 8 days ago by geek_y9.6k • written 9 days ago by chipolino30
1

To better define cell types, it makes sense.

ADD REPLYlink written 9 days ago by geek_y9.6k
1

That depends on the type of analysis you're referring to. PCA, for example, will always focus on the most variable regions. I haven't looked at scATAC-seq data myself but given that it's basically binary, I'm not sure how well the typical variance measures even hold up.

ADD REPLYlink written 8 days ago by Friederike4.1k

can I do sparse PCA on scATAC-seq matrix and see, what peaks correspond to, let's say the first component? And choose those as the most informative (variable)?

ADD REPLYlink written 8 days ago by chipolino30

Well, I'm not sure how "peaks" would be defined in scATAC-seq as there's a maximum of 2 reads per open region per cell. Maybe you want to collapse the information from multiple cells at the same region? What exactly is the question you're trying to address?

ADD REPLYlink written 8 days ago by Friederike4.1k

Usually, dimensionality reduction is done on top variable features (usually top 500). So you can take top variable peaks and build a PCA and see how the tSNE clusters looks like. If you want to overcome the sparsity of data, you could use KNN approach to merge data from n-similar cell. Before doing that I would check tSNE on top 500 variable peaks.

I did not know that the data is binary, so this paper seems to have a nice method to process the data.

ADD REPLYlink modified 7 days ago • written 8 days ago by geek_y9.6k

Thanks! But how do you find most variable peaks, if the data is binary?

ADD REPLYlink written 7 days ago by chipolino30

Sorry I am not aware that it's binary. I updated my answer and moved it to comment as it doesn't qualify as an answer anymore

ADD REPLYlink modified 7 days ago • written 7 days ago by geek_y9.6k

Asked on BioC in the first place and then cross-posted here as suggested there.

ADD REPLYlink modified 7 days ago • written 7 days ago by ATpoint16k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2216 users visited in the last hour