Question: NCBI gtf contains same trasncript id for some trasncripts and giving trouble on htseqcount
0
gravatar for patelbhaumikn
7 weeks ago by
University of missouri
patelbhaumikn0 wrote:

Hi everyone,

I am doing the differential expression of two groups. I have used ncbi refseq gtf https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/002/263/795/GCF_002263795.1_ARS-UCD1.2/GCF_002263795.1_ARS-UCD1.2_genomic.gtf.gz and ran hisat2 and string tie without e option. When I looked at the stringtie output gtf individual and after merging them, it contains some transcript id as " unknown_transcript_1" for a some transcripts. I looked at initial ncbi gtf and some data have gene_id blank and have transcript_id as "unknown_transcript_1" for some transcripts. They are mostly from mitochondrial and scaffold part of genome. so when I ran the htseqcount I got first row as empty gene_id with read numbers. Should I exclude that row from htseqcount and do differential expression ? I have done same job with ensemble gtf and there was no issue like that.

I will really appreciate your help if you guys can suggest and give your recommendation for it .

Thanks in advance.

-Bhaumik

ADD COMMENTlink written 7 weeks ago by patelbhaumikn0

@patelbhaumikn please contact RefSeq to report this issue with GTF files.

ADD REPLYlink written 7 weeks ago by vkkodali2.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2457 users visited in the last hour