Question: Remove patches from gtf file?
2
gravatar for Michelle M.
2.8 years ago by
Michelle M.60
Michelle M.60 wrote:

Hi there,

So I'm using an Ensembl gtf file (GrCh37) for rna-seq analysis and am wondering about the patches.

I know what the annotation patches are and why they're there, but should I exclude them when generating my count matrix in HTseq or Cufflinks? i.e. if I left them in, won't I get multi-reads mapping to both the patch and the original region, thereby screwing the true counts?

Thanks for your input, much appreciated.

Cheers,

M

rna-seq patch ensembl gtf • 1.0k views
ADD COMMENTlink written 2.8 years ago by Michelle M.60

I went through a similar conundrum. While I am not exactly answering your question, I can share this with you: I have pretty heavy libraries and couldn't believe how long the calculations were taking. So I will be removing the patches and restart the analysis; feeling more comfortable about this decision since I came across (this morning) a line from the STAR aligner manual: "Generally, patches and alternative haplotypes should not be included in the genome", suggesting to only use the primary assembly.

https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf (page 5)

You do bring a valid point though. And I would be very curious to see the appropriate answer.

ADD REPLYlink written 2.8 years ago by Joel TM50

Thanks Joel, that helps a lot. I'll be interested to see if anyone can confirm this, but in the meantime I think I'll be removing the patches from the file.

ADD REPLYlink written 2.7 years ago by Michelle M.60

I just came across this, which was helpful: http://seqanswers.com/forums/archive/index.php/t-4459.html

Cheers

ADD REPLYlink written 2.7 years ago by Michelle M.60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1140 users visited in the last hour