3.1 years ago
skmhabeeb • 0


I accessed your tutorial webpage from

But I am unable to understand how to do this step enclosed in the quotes as it is not clearly mentioned. I would appreciate if you could send me the commands or instructions asap.

Prepare annotation table: Make a gene-wise table from downloaded GTF ('/data1/santhilal/rnaseq_analysis/annotation/Homo_sapiens.GRCh38.87.gtf') as explained in the Main assignment 2 and save it to the directory '/data1/santhilal/rnaseq_analysis/annotation/' with name 'Homo_sapiens.GRCh38.87_gene_annotation.txt'.

Thanks in advance, habeeb

3.1 years ago
ATpoint 81k

If you are new I'd simply follow which is a maintained workflow rather than random tutorials on the web. This one you link is quite old and a bit non-standard. Tools like featureCounts or htseq work on a GTF files right away, there is no need for custom parsing.

3.0 years ago
EagleEye 7.5k

Really sorry for my late response. Yes ATpoint, this is my tutorial. Thanks for tagging (though I did not get any notification). This tutorial was for internal course and it goes along with the lectures. I didn't know others are using it too.

Hi skmhabeeb, this step is just simply converting GTF into plain text table format.

cat gencode.v37.basic.annotation.gtf | grep -w "gene" | cut -f1,4,5,7,9 | sed 's/"; /\t/g' | sed 's/gene_id "//g' | sed 's/gene_type "//g' | sed 's/gene_name "//g' | cut -f1-7 | awk 'BEGIN{FS="\t";OFS="\t"}{print $5,$7,$1,$2,$3,$4,$6;}' | sed '1i\Geneid\tGeneSymbol\tChromosome\tStart\tEnd\tStrand\tClass' > gencode.v37.basic.annotation.txt

Also have a look at this following post.

extract only geneID and gene symbol from GTF file


