Entering edit mode
8.4 years ago
zhengyunchaosky
▴
20
Hi,
I try to use "ftp://ftp.ensemblgenomes.org/pub/release-30/plants/gff3/oryza_sativa/Oryza_sativa.IRGSP-1.0.30.gff3.gz" to build the database to do snp annotation.
java -Xmx15G -jar snpEff.jar build -gff3 -v IRGSP
However,I got warning information below.
00:00:00.000 SnpEff version SnpEff 4.1 (build 2015-01-07), by Pablo Cingolani
00:00:00.010 Command: 'build'
00:00:00.028 Building database for 'IRGSP'
00:00:00.029 Reading configuration file 'snpEff.config'. Genome: 'IRGSP'
00:00:01.262 done
Reading GFF3 data file : '/datacenter/disk2/zhengyunchao/snpEffnipo/snpEff/./data/IRGSP/genes.gff'
Reading genes : ....................................................................................................
10000 ....................................................................................................
20000 ....................................................................................................
30000 ........................................................
Total: 35679 Genes added.
Reading transcripts : Total: 0 Transcripts added.
Reading exons : WARNING: Cannot find transcript 'transcript:EPlOSAT00000003714'. Created transcript 'transcript:EPlOSAT00000003714' and gene 'Gene_transcript:EPlOSAT00000003714'
for this exon. File '/datacenter/disk2/zhengyunchao/snpEffnipo/snpEff/./data/IRGSP/genes.gff' line 68 '1 agi exon 2631 2760 . - . Parent=transcript:EPlOS
AT00000003714;Name=EPlOSAE00000004118;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=EPlOSAE00000004118;rank=1;version=1'
WARNING: Cannot find transcript 'transcript:OS01T0100100-01'. Created transcript 'transcript:OS01T0100100-01' and gene 'Gene_transcript:OS01T0100100-01' for this exon. File '/datacenter/disk2
/zhengyunchao/snpEffnipo/snpEff/./data/IRGSP/genes.gff' line 72 '1 irgsp exon 2983 3268 . + . Parent=transcript:OS01T0100100-01;Name=OS01T0100100-01.exon1;co
nstitutive=1;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=OS01T0100100-01.exon1;rank=1;version=1'
Anyway, I successfully build the database to do my annotation.
Total: 248345 sequences added, 0 sequences ignored.
Adjusting transcripts:
Adjusting genes:
Adjusting chromosomes lengths:
Ranking exons:
Create UTRs from CDS (if needed):
Correcting exons based on frame information.
Remove empty chromosomes:
Marking as 'coding' from CDS information:
Done: 42132 transcripts marked
#-----------------------------------------------
# Genome name : 'IRGSP'
# Genome version : 'IRGSP'
# Has protein coding info : true
# Genes : 133430
# Protein coding genes : 42132
#-----------------------------------------------
# Transcripts : 97751
# Avg. transcripts per gene : 0.73
#-----------------------------------------------
# Checked transcripts :
# AA sequences : 0 ( 0.00% )
# DNA sequences : 0 ( 0.00% )
#-----------------------------------------------
# Protein coding transcripts : 42132
# Length errors : 1261 ( 2.99% )
# STOP codons in CDS errors : 0 ( 0.00% )
# START codon errors : 9404 ( 22.32% )
# STOP codon warnings : 1929 ( 4.58% )
# UTR sequences : 38051 ( 38.93% )
# Total Errors : 10269 ( 24.37% )
#-----------------------------------------------
# Cds : 163451
# Exons : 248345
# Exons with sequence : 248345
# Exons without sequence : 0
# Avg. exons per transcript : 2.54
# WARNING : No mitochondrion chromosome found
#-----------------------------------------------
# Number of chromosomes : 61
# Chromosomes names [sizes] :
# '1' [43270923]
# '3' [36413819]
# '2' [35937250]
# '4' [35502694]
# '6' [31248787]
# '5' [29958434]
# '7' [29697621]
# '11' [29021106]
# '8' [28443022]
# '12' [27531856]
# '10' [23207287]
# '9' [23012720]
# 'AP008246' [206004]
# 'AP008247' [157458]
# 'AC160949' [128256]
# 'AC156495' [88500]
# 'AC155918' [32941]
# 'Syng_TIGR_028' [31094]
# 'Syng_TIGR_023' [24772]
# 'Syng_TIGR_045' [22545]
# 'Syng_TIGR_005' [21787]
# 'Syng_TIGR_014' [21421]
# 'Syng_TIGR_047' [20829]
# 'Syng_TIGR_026' [19971]
# 'Syng_TIGR_004' [19457]
# 'Syng_TIGR_021' [17477]
# 'Syng_TIGR_008' [16676]
# 'Syng_TIGR_012' [16417]
# 'Syng_TIGR_010' [15493]
# 'AC174930' [15426]
# 'Syng_TIGR_002' [14476]
# 'Syng_TIGR_037' [13061]
# 'Syng_TIGR_029' [12884]
# 'Syng_TIGR_016' [12792]
# 'Syng_TIGR_027' [11522]
# 'Syng_TIGR_046' [11447]
# 'Syng_TIGR_033' [11093]
# 'Syng_TIGR_011' [10901]
# 'Syng_TIGR_030' [10794]
# 'Syng_TIGR_020' [10699]
# 'Syng_TIGR_035' [10686]
# 'Syng_TIGR_015' [10595]
# 'Syng_TIGR_013' [10512]
# 'Syng_TIGR_036' [10434]
# 'Syng_TIGR_019' [10422]
# 'Syng_TIGR_034' [10311]
# 'Syng_TIGR_009' [10296]
# 'Syng_TIGR_041' [10210]
# 'Syng_TIGR_024' [10060]
# 'Syng_TIGR_022' [9889]
# 'Syng_TIGR_032' [9603]
# 'Syng_TIGR_031' [9548]
# 'Syng_TIGR_050' [8529]
# 'Syng_TIGR_038' [8197]
# 'Syng_TIGR_007' [7820]
# 'Syng_TIGR_048' [7140]
# 'Syng_TIGR_039' [6269]
# 'Syng_TIGR_049' [6261]
# 'Syng_TIGR_044' [6000]
# 'Syng_TIGR_042' [5510]
# 'Syng_TIGR_043' [4236]
#-----------------------------------------------
00:01:02.976 Caracterizing exons by splicing (stage 1) :
....................................................................................................
100000 ....................................................................................................
200000 ................................................
00:01:03.841 Caracterizing exons by splicing (stage 2) :
....................................................................................................
100000 ....................................................................................................
200000 ................................................00:01:04.100 done.
00:01:04.101 [Optional] Rare amino acid annotations
00:01:04.106 Warning: Cannot read optional protein sequence file '/datacenter/disk2/zhengyunchao/snpEffnipo/snpEff/./data/IRGSP/protein.fa', nothing done.
00:01:04.108 Saving database
00:01:54.212 [Optional] Reading regulation elements: GFF
00:01:54.214 Warning: Cannot read optional regulation file '/datacenter/disk2/zhengyunchao/snpEffnipo/snpEff/./data/IRGSP/regulation.gff', nothing done.
00:01:54.214 [Optional] Reading regulation elements: BED
00:01:54.215 Cannot find optional regulation dir '/datacenter/disk2/zhengyunchao/snpEffnipo/snpEff/./data/IRGSP/regulation.bed/', nothing done.
00:01:54.216 [Optional] Reading motifs: GFF
00:01:54.217 Warning: Cannot open PWMs file /datacenter/disk2/zhengyunchao/snpEffnipo/snpEff/./data/IRGSP/pwms.bin. Nothing done
00:01:54.217 Done
00:01:54.240 Logging
00:01:55.259 Checking for updates...
I wonder why the warning information came up. Thanks in advanceļ¼
Maybe there are some 'errors' in the gff3 you've used for creating the db. Can you check if
EPlOSAT00000003714
transcript has a transcript line in the gff3 file (apart from exon line)?Thanks for your respond! Yes, it has the transcript line.