How to remove X & Y chromosome genes from RNAseq data
2
0
Entering edit mode
3.3 years ago

Hello everyone, I am new in this field. I have used STAR and featurecounts followed by DEse2 for my RNAseq analysis. I want to remove X and Y chromosome genes from my bulk seq data. I don't know how to proceed with that. Do I need to remove the genes from the counts after featurecounts or using the Deseq2? I have made a list of X&Y chromosomes Gene_id using Biomart. Can I remove the Gene_id's of X and Y chromosomes from my count data? Will this be the correct way of doing it? Any help is appreciated. Thanks

RNA-Seq STAR featurecounts Deseq2 • 1.6k views
ADD COMMENT
2
Entering edit mode
3.3 years ago
Barry Digby ★ 1.3k

Yep you're pretty much there.

Assuming counts is your counts matrix containing all samples, something to the effect of:

subset <- counts[which(rownames(counts) != biomart_list$Gene),]

edit: != must be dataframes of equal length

ADD COMMENT
0
Entering edit mode

Hi Barry, I am getting a warning "Warning message: In rownames(countdata) != X_Y$Geneid : longer object length is not a multiple of shorter object length".

ADD REPLY
1
Entering edit mode

Try !(rownames(counts) %in% X_Y$Geneid). I don't see how != would work comparing 2 vectors of incompatible sizes.

ADD REPLY
0
Entering edit mode

My bad, I'm used to biomaRt df's derived from count rownames. Feel free to change yours to an answer and ill remove mine. .

ADD REPLY
1
Entering edit mode

Don't worry about that. If it works somewhere, that is knowledge worth sharing. Maybe just add a note stating where != works and where it doesn't.

ADD REPLY
0
Entering edit mode

Hi Barry, sorry for delay in replying. I tried !(rownames(countdata) %in% X_Y$Geneid). I am not getting any error but it is also not removing Geneid.

ADD REPLY
0
Entering edit mode

What is the output to:

head(rownames(countdata))
head(X_Y$Geneid)
ADD REPLY
0
Entering edit mode
> head(rownames(countdata))
[1] "ENSG00000223972" "ENSG00000227232" "ENSG00000278267" "ENSG00000243485" "ENSG00000284332"
[6] "ENSG00000237613"


> head(X_Y$Geneid)
[1] "ENSG00000228572" "ENSG00000182378" "ENSG00000178605" "ENSG00000226179" "ENSG00000167393"
[6] "ENSG00000281849"
ADD REPLY
0
Entering edit mode

Well they do look similar and should ideally overlap. Can you get some help from someone in your team/institution? This needs more involved help but should not take more than a few minutes to figure out.

ADD REPLY

Login before adding your answer.

Traffic: 2453 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6