Question: Could not find fasta record: CuffCompare & Cuffmerge
0
gravatar for Explorer
3.2 years ago by
Explorer70
Australia
Explorer70 wrote:

Hi , I am using the genes.gtf and genome.fa files from iGenomes for Homo sapiens Ensemble GRCh37, for cuffcompare and cuffmerge. But I get the following warnings: (I got many warnings but pasting few of them to avoid long message)

Warning: couldn't find fasta record for 'GL000191.1'!
Warning: couldn't find fasta record for 'GL000192.1'!
Warning: couldn't find fasta record for 'GL000193.1'!
Warning: couldn't find fasta record for 'GL000194.1'!
Warning: couldn't find fasta record for 'GL000195.1'!
Warning: couldn't find fasta record for 'GL000196.1'!

I have seen the previous posts with similar questions but I am not sure why do I get these warnings if I have used .gtf and genome.fa from same set of files provided by iGenomes. The command I ran is below:

cuffcompare  -r /home/jmotwani/mydata/Genomes/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf  -s /home/jmotwani/mydata/Genomes/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa  -o testcuffcomp  test1.gtf test2.gtf

The version of cufflinks I am using is 2.2.1. Though I get an output file generated but I am not sure if its complete or truncated because of these warnings.

Any help with this will be greatly appreciated. Thanks.

rna-seq cufflinks • 1.3k views
ADD COMMENTlink modified 3.2 years ago by geek_y9.8k • written 3.2 years ago by Explorer70

I extracted the chromosome name column from the gtf file by : cut -f 1 genes.gtf | sort | uniq

And the list has all the contig names listed in the warning messages. And there are no fasta files provided for those contigs in the genomes folder of iGenomes. Was wondering why are the gtf files inclusive of these contigs if the fasta files are not provided for those contigs. I presume I can go ahead with the cuffdiff analysis in spite of these warnings because skipping these contig files would not affect any analysis. Any thoughts?

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by Explorer70
2

Yes. You could clean your GTF to keep only the chromosomes/contigs present in your fasta file, such that these warnings will disappear and the analysis would be clean, instead of going ahead with warnings.

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by geek_y9.8k

I have truncated my message so that it is not too long for people to read it. Hopefully I may get some suggestions now. Thanks for your help.

ADD REPLYlink written 3.2 years ago by Explorer70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 666 users visited in the last hour