snpEff warning information
0
0
Entering edit mode
8.4 years ago

Hi,

I try to use "ftp://ftp.ensemblgenomes.org/pub/release-30/plants/gff3/oryza_sativa/Oryza_sativa.IRGSP-1.0.30.gff3.gz" to build the database to do snp annotation.

java -Xmx15G -jar snpEff.jar build -gff3 -v IRGSP

However,I got warning information below.

00:00:00.000    SnpEff version SnpEff 4.1 (build 2015-01-07), by Pablo Cingolani
00:00:00.010    Command: 'build'
00:00:00.028    Building database for 'IRGSP'
00:00:00.029    Reading configuration file 'snpEff.config'. Genome: 'IRGSP'
00:00:01.262    done
Reading GFF3 data file  : '/datacenter/disk2/zhengyunchao/snpEffnipo/snpEff/./data/IRGSP/genes.gff'
        Reading genes       : ....................................................................................................
                10000   ....................................................................................................
                20000   ....................................................................................................
                30000   ........................................................
        Total: 35679 Genes added.
        Reading transcripts :   Total: 0 Transcripts added.
        Reading exons       : WARNING: Cannot find transcript 'transcript:EPlOSAT00000003714'. Created transcript 'transcript:EPlOSAT00000003714' and gene 'Gene_transcript:EPlOSAT00000003714'
 for this exon. File '/datacenter/disk2/zhengyunchao/snpEffnipo/snpEff/./data/IRGSP/genes.gff' line 68  '1      agi     exon    2631    2760    .       -       .       Parent=transcript:EPlOS
AT00000003714;Name=EPlOSAE00000004118;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=EPlOSAE00000004118;rank=1;version=1'
WARNING: Cannot find transcript 'transcript:OS01T0100100-01'. Created transcript 'transcript:OS01T0100100-01' and gene 'Gene_transcript:OS01T0100100-01' for this exon. File '/datacenter/disk2
/zhengyunchao/snpEffnipo/snpEff/./data/IRGSP/genes.gff' line 72 '1      irgsp   exon    2983    3268    .       +       .       Parent=transcript:OS01T0100100-01;Name=OS01T0100100-01.exon1;co
nstitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=OS01T0100100-01.exon1;rank=1;version=1'


Anyway, I successfully build the database to do my annotation.

 Total: 248345 sequences added, 0 sequences ignored.

        Adjusting transcripts: 
        Adjusting genes: 
        Adjusting chromosomes lengths: 
        Ranking exons: 
        Create UTRs from CDS (if needed): 
        Correcting exons based on frame information.

        Remove empty chromosomes: 

        Marking as 'coding' from CDS information: 
        Done: 42132 transcripts marked
#-----------------------------------------------
# Genome name                : 'IRGSP'
# Genome version             : 'IRGSP'
# Has protein coding info    : true
# Genes                      : 133430
# Protein coding genes       : 42132
#-----------------------------------------------
# Transcripts                : 97751
# Avg. transcripts per gene  : 0.73
#-----------------------------------------------
# Checked transcripts        : 
#               AA sequences :      0 ( 0.00% )
#              DNA sequences :      0 ( 0.00% )
#-----------------------------------------------
# Protein coding transcripts : 42132
#              Length errors :   1261 ( 2.99% )
#  STOP codons in CDS errors :      0 ( 0.00% )
#         START codon errors :   9404 ( 22.32% )
#        STOP codon warnings :   1929 ( 4.58% )
#              UTR sequences :  38051 ( 38.93% )
#               Total Errors :  10269 ( 24.37% )
#-----------------------------------------------
# Cds                        : 163451
# Exons                      : 248345
# Exons with sequence        : 248345
# Exons without sequence     : 0
# Avg. exons per transcript  : 2.54
# WARNING                    : No mitochondrion chromosome found
#-----------------------------------------------
# Number of chromosomes      : 61
# Chromosomes names [sizes]  :
#               '1' [43270923]
#               '3' [36413819]
#               '2' [35937250]
#               '4' [35502694]
#               '6' [31248787]
#               '5' [29958434]
#               '7' [29697621]
#               '11' [29021106]
#               '8' [28443022]
#               '12' [27531856]
#               '10' [23207287]
#               '9' [23012720]
#               'AP008246' [206004]
#               'AP008247' [157458]
#               'AC160949' [128256]
#               'AC156495' [88500]
#               'AC155918' [32941]
#               'Syng_TIGR_028' [31094]
#               'Syng_TIGR_023' [24772]
#               'Syng_TIGR_045' [22545]
#               'Syng_TIGR_005' [21787]
#               'Syng_TIGR_014' [21421]
#               'Syng_TIGR_047' [20829]
#               'Syng_TIGR_026' [19971]
#               'Syng_TIGR_004' [19457]
#               'Syng_TIGR_021' [17477]
#               'Syng_TIGR_008' [16676]
#               'Syng_TIGR_012' [16417]
#               'Syng_TIGR_010' [15493]
#               'AC174930' [15426]
#               'Syng_TIGR_002' [14476]
#               'Syng_TIGR_037' [13061]
#               'Syng_TIGR_029' [12884]
#               'Syng_TIGR_016' [12792]
#               'Syng_TIGR_027' [11522]
#               'Syng_TIGR_046' [11447]
#               'Syng_TIGR_033' [11093]
#               'Syng_TIGR_011' [10901]
#               'Syng_TIGR_030' [10794]
#               'Syng_TIGR_020' [10699]
#               'Syng_TIGR_035' [10686]
#               'Syng_TIGR_015' [10595]
#               'Syng_TIGR_013' [10512]
#               'Syng_TIGR_036' [10434]
#               'Syng_TIGR_019' [10422]
#               'Syng_TIGR_034' [10311]
#               'Syng_TIGR_009' [10296]
#               'Syng_TIGR_041' [10210]
#               'Syng_TIGR_024' [10060]
#               'Syng_TIGR_022' [9889]
#               'Syng_TIGR_032' [9603]
#               'Syng_TIGR_031' [9548]
#               'Syng_TIGR_050' [8529]
#               'Syng_TIGR_038' [8197]
#               'Syng_TIGR_007' [7820]
#               'Syng_TIGR_048' [7140]
#               'Syng_TIGR_039' [6269]
#               'Syng_TIGR_049' [6261]
#               'Syng_TIGR_044' [6000]
#               'Syng_TIGR_042' [5510]
#               'Syng_TIGR_043' [4236]
#-----------------------------------------------

00:01:02.976    Caracterizing exons by splicing (stage 1) : 
        ....................................................................................................
        100000  ....................................................................................................
        200000  ................................................
00:01:03.841    Caracterizing exons by splicing (stage 2) : 
        ....................................................................................................
        100000  ....................................................................................................
        200000  ................................................00:01:04.100    done.
00:01:04.101    [Optional] Rare amino acid annotations
00:01:04.106    Warning: Cannot read optional protein sequence file '/datacenter/disk2/zhengyunchao/snpEffnipo/snpEff/./data/IRGSP/protein.fa', nothing done.
00:01:04.108    Saving database
00:01:54.212    [Optional] Reading regulation elements: GFF
00:01:54.214    Warning: Cannot read optional regulation file '/datacenter/disk2/zhengyunchao/snpEffnipo/snpEff/./data/IRGSP/regulation.gff', nothing done.
00:01:54.214    [Optional] Reading regulation elements: BED 
00:01:54.215    Cannot find optional regulation dir '/datacenter/disk2/zhengyunchao/snpEffnipo/snpEff/./data/IRGSP/regulation.bed/', nothing done.
00:01:54.216    [Optional] Reading motifs: GFF
00:01:54.217    Warning: Cannot open PWMs file /datacenter/disk2/zhengyunchao/snpEffnipo/snpEff/./data/IRGSP/pwms.bin. Nothing done
00:01:54.217    Done
00:01:54.240    Logging
00:01:55.259    Checking for updates...

I wonder why the warning information came up. Thanks in advanceļ¼

SNP genome • 5.5k views
ADD COMMENT
0
Entering edit mode

Maybe there are some 'errors' in the gff3 you've used for creating the db. Can you check if EPlOSAT00000003714 transcript has a transcript line in the gff3 file (apart from exon line)?

ADD REPLY
0
Entering edit mode

Thanks for your respond! Yes, it has the transcript line.

1       agi     transcript      2631    2760    .       -       .       ID=transcript:EPlOSAT00000003714;Parent=gene:EPlOSAG00000002326;biotype=ncRNA;transcript_id=EPlOSAT00000003714;version=1
1       agi     exon    2631    2760    .       -       .       Parent=transcript:EPlOSAT00000003714;Name=EPlOSAE00000004118;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=EPlOSAE00000004118;rank=1;version=1
ADD REPLY

Login before adding your answer.

Traffic: 1287 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6