Question: Remove patches from gtf file?
2.9 years ago
Michelle M.
Michelle M. wrote:

Hi there,

So I'm using an Ensembl gtf file (GrCh37) for rna-seq analysis and am wondering about the patches.

I know what the annotation patches are and why they're there, but should I exclude them when generating my count matrix in HTseq or Cufflinks? i.e. if I left them in, won't I get multi-reads mapping to both the patch and the original region, thereby screwing the true counts?

Thanks for your input, much appreciated.



I went through a similar conundrum. While I am not exactly answering your question, I can share this with you: I have pretty heavy libraries and couldn't believe how long the calculations were taking. So I will be removing the patches and restart the analysis; feeling more comfortable about this decision since I came across (this morning) a line from the STAR aligner manual: "Generally, patches and alternative haplotypes should not be included in the genome", suggesting to only use the primary assembly. (page 5)

You do bring a valid point though. And I would be very curious to see the appropriate answer.

Thanks Joel, that helps a lot. I'll be interested to see if anyone can confirm this, but in the meantime I think I'll be removing the patches from the file.

I just came across this, which was helpful:


