Running htseq-count to "grab" long non coding gene_id names
Entering edit mode
9 months ago
dimitrischat ▴ 180

hi all,

new to bioinformatics. so bare with me.. I am trying find long non coding RNA from RNA-seq data. As i checked the human gtf file there are 2 different types of long non coding RNA, "lnc_RNA" and "lncRNA", like so:

NC_000001.11    Gnomon  transcript  29926   31295   .   +   .   gene_id "MIR1302-2HG"; transcript_id "XR_001737835.1"; db_xref "GeneID:107985730"; gbkey "ncRNA"; gene "MIR1302-2HG"; model_evidence "Supporting evidence includes similarity to: 100% coverage of the annotated genomic feature by RNAseq alignments, including 8 samples with support for all annotated introns"; product "MIR1302-2 host gene, transcript variant X2"; transcript_biotype "lnc_RNA"; 

NC_000001.11    BestRefSeq  gene    34611   36081   .   -   .   gene_id "FAM138A"; transcript_id ""; db_xref "GeneID:645520"; db_xref "HGNC:HGNC:32334"; description "family with sequence similarity 138 member A"; gbkey "Gene"; gene "FAM138A"; gene_biotype "lncRNA"; gene_synonym "F379"; gene_synonym "FAM138F";

"lnc_RNA" is on the "transcript" line, and "lncRNA" is on the "gene" line. My first question is should I choose "lncRNA" ?

And most importantly, how do i get only the "gene_id" names of the ones that have "lncRNA" ?

edit: for the 2nd question i did: grep 'lncRNA' GRCh38.p13_genomic.gtf > GRCh38.p13_genomic_lnc.gtf and proceeded as usual.

But is my choice correct of the lncRNA?

lncRNA htseq • 281 views
Entering edit mode

In the example you posted above one is a gene_biotype and other transcript_biotype. Biotypes should be applicable to both Gene/Transcripts. I am not sure why there is an extra _ in your example for transcript. Is that convention followed for all transcripts? If you are doing analysis at the gene level then you should only select those entries.


Login before adding your answer.

Traffic: 1429 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6