Question

How to remove X & Y chromosome genes from RNAseq data

0

Entering edit mode

3.3 years ago

archit22684 • 0

Hello everyone, I am new in this field. I have used STAR and featurecounts followed by DEse2 for my RNAseq analysis. I want to remove X and Y chromosome genes from my bulk seq data. I don't know how to proceed with that. Do I need to remove the genes from the counts after featurecounts or using the Deseq2? I have made a list of X&Y chromosomes Gene_id using Biomart. Can I remove the Gene_id's of X and Y chromosomes from my count data? Will this be the correct way of doing it? Any help is appreciated. Thanks

RNA-Seq STAR featurecounts Deseq2 • 1.6k views

ADD COMMENT • link 3.3 years ago by archit22684 • 0

Ram · Answer 1 · 2021-01-04

2

Entering edit mode

3.3 years ago

Barry Digby ★ 1.3k

Yep you're pretty much there.

Assuming counts is your counts matrix containing all samples, something to the effect of:

subset <- counts[which(rownames(counts) != biomart_list$Gene),]

edit: != must be dataframes of equal length

ADD COMMENT • link 3.3 years ago by Barry Digby ★ 1.3k

0

Entering edit mode

Hi Barry, I am getting a warning "Warning message: In rownames(countdata) != X_Y$Geneid : longer object length is not a multiple of shorter object length".

ADD REPLY • link 3.3 years ago by archit22684 • 0

1

Entering edit mode

Try !(rownames(counts) %in% X_Y$Geneid). I don't see how != would work comparing 2 vectors of incompatible sizes.

ADD REPLY • link 3.3 years ago by Ram 43k

0

Entering edit mode

My bad, I'm used to biomaRt df's derived from count rownames. Feel free to change yours to an answer and ill remove mine. .

ADD REPLY • link 3.3 years ago by Barry Digby ★ 1.3k

1

Entering edit mode

Don't worry about that. If it works somewhere, that is knowledge worth sharing. Maybe just add a note stating where != works and where it doesn't.

ADD REPLY • link 3.3 years ago by Ram 43k

0

Entering edit mode

Hi Barry, sorry for delay in replying. I tried !(rownames(countdata) %in% X_Y$Geneid). I am not getting any error but it is also not removing Geneid.

ADD REPLY • link updated 3.3 years ago by Ram 43k • written 3.3 years ago by archit22684 • 0

0

Entering edit mode

What is the output to:

head(rownames(countdata))
head(X_Y$Geneid)

ADD REPLY • link 3.3 years ago by Ram 43k

0

Entering edit mode

> head(rownames(countdata))
[1] "ENSG00000223972" "ENSG00000227232" "ENSG00000278267" "ENSG00000243485" "ENSG00000284332"
[6] "ENSG00000237613"


> head(X_Y$Geneid)
[1] "ENSG00000228572" "ENSG00000182378" "ENSG00000178605" "ENSG00000226179" "ENSG00000167393"
[6] "ENSG00000281849"

ADD REPLY • link 3.3 years ago by archit22684 • 0

0

Entering edit mode

Well they do look similar and should ideally overlap. Can you get some help from someone in your team/institution? This needs more involved help but should not take more than a few minutes to figure out.

ADD REPLY • link 3.3 years ago by Ram 43k