7 months ago
skmhabeeb • 0

Hi,

I accessed your tutorial webpage from https://decodebiology.github.io/bioinfotutorials/rnaseq_tutorial.html

But I am unable to understand how to do this step enclosed in the quotes as it is not clearly mentioned. I would appreciate if you could send me the commands or instructions asap.

Prepare annotation table: Make a gene-wise table from downloaded GTF ('/data1/santhilal/rnaseq_analysis/annotation/Homo_sapiens.GRCh38.87.gtf') as explained in the Main assignment 2 and save it to the directory '/data1/santhilal/rnaseq_analysis/annotation/' with name 'Homo_sapiens.GRCh38.87_gene_annotation.txt'.


rna-seq R edgeR gene
7 months ago
ATpoint 54k

If you are new I'd simply follow https://www.bioconductor.org/packages/devel/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html which is a maintained workflow rather than random tutorials on the web. This one you link is quite old and a bit non-standard. Tools like featureCounts or htseq work on a GTF files right away, there is no need for custom parsing.

5 months ago
EagleEye 7.1k

Really sorry for my late response. Yes ATpoint, this is my tutorial. Thanks for tagging (though I did not get any notification). This tutorial was for internal course and it goes along with the lectures. I didn't know others are using it too.

Hi skmhabeeb, this step is just simply converting GTF into plain text table format.

cat gencode.v37.basic.annotation.gtf | grep -w "gene" | cut -f1,4,5,7,9 | sed 's/"; /\t/g' | sed 's/gene_id "//g' | sed 's/gene_type "//g' | sed 's/gene_name "//g' | cut -f1-7 | awk 'BEGIN{FS="\t";OFS="\t"}{print $5,$7,$1,$2,$3,$4,\$6;}' | sed '1i\Geneid\tGeneSymbol\tChromosome\tStart\tEnd\tStrand\tClass' > gencode.v37.basic.annotation.txt


Also have a look at this following post.

extract only geneID and gene symbol from GTF file