Question: How to remove X & Y chromosome genes from RNAseq data
0
gravatar for archit22684
22 days ago by
archit226840 wrote:

Hello everyone, I am new in this field. I have used STAR and featurecounts followed by DEse2 for my RNAseq analysis. I want to remove X and Y chromosome genes from my bulk seq data. I don't know how to proceed with that. Do I need to remove the genes from the counts after featurecounts or using the Deseq2? I have made a list of X&Y chromosomes Gene_id using Biomart. Can I remove the Gene_id's of X and Y chromosomes from my count data? Will this be the correct way of doing it? Any help is appreciated. Thanks

ADD COMMENTlink modified 13 days ago • written 22 days ago by archit226840
2
gravatar for Barry Digby
22 days ago by
Barry Digby630
National University of Ireland, Galway
Barry Digby630 wrote:

Yep you're pretty much there.

Assuming counts is your counts matrix containing all samples, something to the effect of:

subset <- counts[which(rownames(counts) != biomart_list$Gene),]

edit: != must be dataframes of equal length

ADD COMMENTlink modified 18 days ago • written 22 days ago by Barry Digby630

Hi Barry, I am getting a warning "Warning message: In rownames(countdata) != X_Y$Geneid : longer object length is not a multiple of shorter object length".

ADD REPLYlink written 19 days ago by archit226840
1

Try !(rownames(counts) %in% X_Y$Geneid). I don't see how != would work comparing 2 vectors of incompatible sizes.

ADD REPLYlink written 19 days ago by _r_am32k

My bad, I'm used to biomaRt df's derived from count rownames. Feel free to change yours to an answer and ill remove mine. .

ADD REPLYlink modified 19 days ago • written 19 days ago by Barry Digby630
1

Don't worry about that. If it works somewhere, that is knowledge worth sharing. Maybe just add a note stating where != works and where it doesn't.

ADD REPLYlink written 18 days ago by _r_am32k

Hi Barry, sorry for delay in replying. I tried !(rownames(countdata) %in% X_Y$Geneid). I am not getting any error but it is also not removing Geneid.

ADD REPLYlink modified 13 days ago by _r_am32k • written 13 days ago by archit226840

What is the output to:

head(rownames(countdata))
head(X_Y$Geneid)
ADD REPLYlink written 13 days ago by _r_am32k
> head(rownames(countdata))
[1] "ENSG00000223972" "ENSG00000227232" "ENSG00000278267" "ENSG00000243485" "ENSG00000284332"
[6] "ENSG00000237613"


> head(X_Y$Geneid)
[1] "ENSG00000228572" "ENSG00000182378" "ENSG00000178605" "ENSG00000226179" "ENSG00000167393"
[6] "ENSG00000281849"
ADD REPLYlink modified 13 days ago • written 13 days ago by archit226840

Well they do look similar and should ideally overlap. Can you get some help from someone in your team/institution? This needs more involved help but should not take more than a few minutes to figure out.

ADD REPLYlink written 12 days ago by _r_am32k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2656 users visited in the last hour
_