Question: How to create custom gtf annotation file?
0
John • 210 wrote:
Hi
I am using RSEM (with bowtie2) for alignment then gene count. Using Refseq Annotation (gff3), and genomic.fna reference Fasta file from NCBI. RSEM can convert gff3 to gtf file.
How can I subset the GTF file (or gff3 file) by gene a name. I want to extract the annotation (gtf) for particular gene and extract the gene sequence from reference Fasta file. Then I want to perform alignment.
This is especially to reduce time by avoiding aligning whole genome.
Thanks in anticipation.
ADD COMMENT
• link
•
modified 19 months ago
by
caggtaagtat • 1.4k
•
written
19 months ago by
John • 210
This could potentially force some reads to be aligned to your gene, which would have normally aligned somewhere else.
That's what happened. There are more reads than I expected.
You should not do that! Aligning to only your genes will bias the analysis as your RNASeq experiment reflect the entire transcriptome not just your gene.
Yes, just switch to pseudo-aligners if you want to increase the speed. That's sufficient for gene expression
Can't you just grep for the gene name of interest and redirect the output to a file? All the lines relevant to that gene should have the ID, and this would select and place all lines with the given gene id into a single file.
If you are just interested in gene expression, you could speed up your analysis if you use pseudo-aligner like salmon, which are much faster than "real" aligner programms.
Or if you really need the nucleotide precise alignment, than I would use STAR, which is a little faster and has a higher fidelity.
Edit: I moved it into the comments, but I adressed the issue of running time, since the overall question was how to speed up the alignment process.
Could you rewrite this answer to address OPs question about gtf files