nmt1 gene present in Tuxedo output both not DESeq!
1
0
Entering edit mode
7.2 years ago
Parham ★ 1.6k

Hi,

I analysed my RNA-seq data for S. pombe once with Tuxedo pipeline and once with DESeq! Surprisingly I see one of the genes which is important for my experiment present in Tuxedo gene_exp.diff output but not in deseq.res

In gene_exp.diff I get nmt1 III:1837498-1844115 but when I look for it on pombase  it has other coordination starting 1838335 and ending at 1839525 and I don't see this gene in deseq.res output at all. Am I missing something with annotations or there some other thing that I am not considering? 

Your help is much appreciated.

/Parham

Tuxedo DESeq • 1.6k views
ADD COMMENT
0
Entering edit mode

I am doing this from the RNA-seq workflow. I have all my accpeted_hits.bam and accepted_hits.bam.bai in tophat_all. Does it help? Please let me know if you need more info!

> library(GenomicAlignments)
> fls <- list.files("tophat_all/", pattern="bam$", full.names =T)
> bamfls <- BamFileList(fls)
> flag <- scanBamFlag(isNotPrimaryRead=FALSE)
> param <- ScanBamParam(flag=flag)
> gnCnt <- summarizeOverlaps(exByGn, bamfls, mode="Union",
+ ignore.strand=TRUE, param=param)
> cnts=assay(gnCnt)

> dim(cnts)
[1] 7017    6
> sel.rn=rowSums(cnts) !=0
> cnts=cnts[sel.rn,]
> dim(cnts)
[1] 6097    6

 

ADD REPLY
0
Entering edit mode

Yup, that helps. I would guess that either exByGn doesn't contain nmt1 or for some reason the counts for it are 0 and it's getting filtered when you subset cnts. Both of those are things you should be able to check.

ADD REPLY
0
Entering edit mode

I could check cnts and it had zero value for all bam files! So it must be exByGn not containing nmt1 as you said, but I couldn't figure out how to check it. Would you give me a hint on that? 

ADD REPLY
0
Entering edit mode

If it's all 0 in cnts then it'd be easier to just redo the counting with featureCounts.

ADD REPLY
0
Entering edit mode

Alright I did it again with featureCounts. Thanks for suggesting it! But here it still gives all zero for all BAM mapped reads! Here is counts.txt output and counts.txt.summary. Since I never used it before I am writing the code I used for this purpose. If you could kindly have a look on that if I am doing something wrong. 

$ featureCounts -a genes.gtf -t exon -g gene_id -o counts.txt 1.bam 2.bam 3.bam 4.bam 5.bam 6.bam

ADD REPLY
0
Entering edit mode

This is due to the ncRNA on the opposite strand. Since you have a stranded protocol, it's impossible to discriminate between reads mapping to the ncRNA and nmt1 (well, you can and do get reads assigned to that ncRNA, but that's because part of it doesn't overlap another feature). I'm guessing that cufflinks is either ignoring the ncRNA or its EM algorithm prioritizes against it (or perhaps it uses coverage in gauging which gene to assign counts to, though I could see that back firing).

ADD REPLY
0
Entering edit mode

When I was running Tuxedo pipeline I was giving --library-type -frsecondstrand for all steps! Does that explain it?

ADD REPLY
0
Entering edit mode

It does. If these really are stranded libraries then tell summarizeOverlaps() or featureCounts that.

ADD REPLY
0
Entering edit mode

Great now it has reads and makes sense! Thanks a lot!

ADD REPLY
1
Entering edit mode
7.2 years ago

That cufflinks changes a gene's coordinates is rather expected, since that's partly its purpose. The output of DESeq is dependent upon what you use as input. So check if nmt1 is in there. BTW, you should use DESeq2 rather than DESeq if you aren't already (though in this case it'd be possible for nmt1 to get filtered out).

ADD COMMENT
0
Entering edit mode

Thanks Devon, however I don't understand  "what you use as input"! I have the same mapped reads as I used for cufflinks and yes I see the nmt1 in there (IGV). What am I doing wrong?! 

And regarding DESeq2, so you are saying that DESeq can filter out nmt1 but not DESeq2? A bit of explanation on that is much appreciated. 

Regards,

ADD REPLY
0
Entering edit mode

Regarding DESeq2, no I mean that it could filter nmt1 out while DESeq wouldn't, though DESeq2 is the better choice for you anyway.

For "what you use as input", I mean the output from either htseq-count or featureCounts (or whatever else you used) that you used to get counts for DESeq. The first thing to check is if nmt1 exists in those files (if not, well that's your problem).

ADD REPLY
0
Entering edit mode

I see! I used accepted_hits.bam out of TopHat for both DESeq and DESeq2, but in none of them I see nmt1! However I see the gene perfectly mapped when I look it up in IGV as above! Do you know what can the problem be?

ADD REPLY
0
Entering edit mode

Well you're not using a BAM file as input, you wouldn't get any meaningful results. Show the first few lines of code that you're using in R in order to get counts ready for DESeq/DESeq2 and then I can offer more insight.

ADD REPLY
0
Entering edit mode

Sorry I replied to the original post by mistake. Please look below!

 

ADD REPLY

Login before adding your answer.

Traffic: 1763 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6