Question: problem with using gtf file in cufflinks in Galaxy
0
gravatar for nazaninhoseinkhan
3.4 years ago by
Iran, Islamic Republic Of
nazaninhoseinkhan370 wrote:

Dear all,

I am using cufflinks in Galaxy, but I faced with the problem with FPKMs.

When I use gtf file (hg19: shared data > data libraries > iGenomes )and reference genome for human available in Galaxy(hg19: a tophat option), the cufflinks output gives me genes ID, but most FPKMs are zero.

Can you help me find out the problem?

How can I choose a compatible reference genome for that GTF file?

Thank you in advance

Nazanin

rna-seq cufflinks fpkm gtf • 1.2k views
ADD COMMENTlink modified 13 months ago by Biostar ♦♦ 20 • written 3.4 years ago by nazaninhoseinkhan370
0
gravatar for michael.ante
3.4 years ago by
michael.ante3.4k
Austria/Vienna
michael.ante3.4k wrote:

Hi Nazanin,

can you check the chromosome names in the GTF and the bam file? Do they start with a chr in the GTF and without in the bam -- or vice versa? Is the alignment performed on HG19 annotation?

Cheers,

Michael

ADD COMMENTlink written 3.4 years ago by michael.ante3.4k

Thanks for your response.

Both hg19 gtf and cufflinks output (assembled transcrips) start with chr.

When I run tophat I also use hg19 reference genome.

best

Nazanin

ADD REPLYlink written 3.4 years ago by nazaninhoseinkhan370
0
gravatar for WouterDeCoster
3.4 years ago by
Belgium
WouterDeCoster41k wrote:

It's not surprisingly that some FPKMs are 0 (depending on your total number of reads and tissue).

-Edited to replace 'a lot of' by 'some'-

ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by WouterDeCoster41k
1

If most (as stated by @Nazanin) are zero then that may be an indication of an upstream problem :-). Assuming public galaxy has the right combination of reference and GTF files.

ADD REPLYlink written 3.4 years ago by genomax72k

Right, edited my response to "some". I guess subjective quantitative terms can be misinterpreted easily ;)

ADD REPLYlink written 3.4 years ago by WouterDeCoster41k

You mean that if FPKM of "some" genes are zero, is OK, due to total number of reads and under studied tissue?

ADD REPLYlink written 3.4 years ago by nazaninhoseinkhan370

It may be ok since not every gene would be expressed/detected under all conditions. That said, can you be more specific? How many are zero (out of a total #) in your data?

ADD REPLYlink written 3.4 years ago by genomax72k

Thanks for your comment.

Yes,your right. The file has 4082 zero out of 65536( or may be more, because my excel cannot show more).

best

Nazanin

ADD REPLYlink written 3.4 years ago by nazaninhoseinkhan370
1

If you can, avoid Excel as much as possible in bioinformatics analysis. Next to automated formatting (changing floats or gene names into date format), you may experience differences due to floating point arithmetics.

ADD REPLYlink written 3.4 years ago by michael.ante3.4k

That looks like nothing to worry about. These genes are either lowly expressed (and therefore not sequenced) or just tissue-specific.

ADD REPLYlink written 3.4 years ago by WouterDeCoster41k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1336 users visited in the last hour