Picard CollectRnaSeqMetrics error
2
1
Entering edit mode
9.7 years ago
juncheng ▴ 220

I want to get mapping statistic from a mapped bam file and want to use CollectRnaSeqMetrics.

I have a gtf file which is downloaded from http://genome.ucsc.edu/cgi-bin/hgTables. I use this as refFlat.

head of gtf:

1    hg19_refFlat    exon    11874    12227    0.000000    +    .    gene_id "DDX11L1"; transcript_id "DDX11L1"; 
1    hg19_refFlat    exon    12613    12721    0.000000    +    .    gene_id "DDX11L1"; transcript_id "DDX11L1"; 
1    hg19_refFlat    exon    13221    14409    0.000000    +    .    gene_id "DDX11L1"; transcript_id "DDX11L1";

I think the problem is this gtf file. Does anyone know how to get the correct refFlat file of human?

The error from CollectRnaSeqMetrics is:

Exception in thread "main" net.sf.picard.annotation.AnnotationException: Wrong number of fields in refFlat file /home/JCheng/UCSC_RefSeqGenes_GRCh37_hg19_withoutChr.gtf at line 1
    at net.sf.picard.annotation.RefFlatReader.load(RefFlatReader.java:80)
    at net.sf.picard.annotation.RefFlatReader.load(RefFlatReader.java:66)
    at net.sf.picard.annotation.GeneAnnotationReader.loadRefFlat(GeneAnnotationReader.java:37)
    at net.sf.picard.analysis.CollectRnaSeqMetrics.setup(CollectRnaSeqMetrics.java:96)
    at net.sf.picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java:102)
    at net.sf.picard.analysis.SinglePassSamProgram.doWork(SinglePassSamProgram.java:55)
    at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
    at net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:119)
    at net.sf.picard.analysis.CollectRnaSeqMetrics.main(CollectRnaSeqMetrics.java:88)
RNA-Seq • 9.1k views
ADD COMMENT
0
Entering edit mode

Hello,

I am getting same error and could anyone please help me. I used gtfToGenePred to convert my gtf file to refFlat file. and my refFlat looks like

Ec-00_000010.1  chr_00  -       149     6731    149     6731    10      149,897,1535,2091,2535,3474,4006,4702,6245,6709,        428,1100,1674,2268,3070,3557,4155,4968,6363,6731,
Ec-00_000020.1  chr_00  -       28572   29122   28572   29122   2       28572,28937,    28582,29122,
Ec-00_000030.1  chr_00  +       29412   32214   29412   32214   1       29412,  32214,
Ec-00_000040.1  chr_00  +       34287   34360   34287   34360   1       34287,  34360,
Ec-00_000050.1  chr_00  -       36705   39329   36705   37902   3       36705,37422,39143,      36870,37944,39329,
Ec-00_000060.1  chr_00  +       43007   44099   43007   44099   3       43007,43404,43829,      43046,43455,44099,

and when I try to run

java -jar /software/shared/apps/x86_64/picard-tools/1.56/CollectRnaSeqMetrics.jar \
    REF_FLAT=chr_00.refFlat.txt \
    RIBOSOMAL_INTERVALS=null \
    STRAND_SPECIFICITY= NONE \
    CHART_OUTPUT=b2_cOLLECTrna.pdf \
    METRIC_ACCUMULATION_LEVEL=ALL_READS \
    INPUT=out.prefix.bam \
    OUTPUT=B2_CollectRNAMetrices

I end up getting

Exception in thread "main" net.sf.picard.annotation.AnnotationException: Wrong number of fields in refFlat file chr_00.refFlat.txt at line 1
        at net.sf.picard.annotation.RefFlatReader.load(RefFlatReader.java:80)
        at net.sf.picard.annotation.RefFlatReader.load(RefFlatReader.java:66)
        at net.sf.picard.annotation.GeneAnnotationReader.loadRefFlat(GeneAnnotationReader.java:37)
        at net.sf.picard.analysis.CollectRnaSeqMetrics.setup(CollectRnaSeqMetrics.java:137)
        at net.sf.picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java:101)
        at net.sf.picard.analysis.SinglePassSamProgram.doWork(SinglePassSamProgram.java:54)
        at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:175)
        at net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:118)
        at net.sf.picard.analysis.CollectRnaSeqMetrics.main(CollectRnaSeqMetrics.java:101

Could anyone help me figure out what wrong with Refflat file?

ADD REPLY
1
Entering edit mode

You are missing the gene name at the first column. Check solinvicta comment to fix this. Basically you have to run gtfToGenePred -genePredExt and then move the gene name (column 12) to the first column.

ADD REPLY
1
Entering edit mode
9.7 years ago

from the sources refflat should have the following columns:

 enum RefFlatColumns{GENE_NAME, TRANSCRIPT_NAME, CHROMOSOME, STRAND, TX_START, TX_END, CDS_START, CDS_END, EXON_COUNT, EXON_STARTS, EXON_ENDS}

You'd better use: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/refFlat.txt.gz

ADD COMMENT
0
Entering edit mode

Thanks you so much!

ADD REPLY
0
Entering edit mode

Actually, does "chr1" need to be changed to "1"? My reference is "1" form.

ADD REPLY
0
Entering edit mode

yes, very probably

ADD REPLY
0
Entering edit mode

is there a refFlat.txt for ENSEMBL? Thanks!

ADD REPLY
1
Entering edit mode
8.0 years ago
solinvicta ▴ 10

Found this link and the process seemed to work - pretty much you output a few extra fields with different options in the original command and then trim them back:

https://www.snip2code.com/Snippet/77082/Convert-gene-annotations-from-GTF-to-gen

ADD COMMENT

Login before adding your answer.

Traffic: 2018 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6