Question

Error in Create switchAnalyzeRlist

0

Entering edit mode

21 months ago

Saeed • 0

Hello everyone., I am using isoformswitchinganalysisR to look at different transcript expression and quantification data generated using Kallisto. I have stuck in step of Create switchAnalyzeRlist. According to warning error it looks I have problem in gtf file. Really appreciate anybody can help me how can I find gtf file contain phaplotyps info? or where can I find this type of gtf file <Ensembl_version>.chr_patch_hapl_scaff.gtf file.

### Create switchAnalyzeRlist
> aSwitchList <- importRdata(
+   isoformCountMatrix   = Quant$counts,
+   isoformRepExpression = Quant$abundance,
+   designMatrix         = myDesign,
+   isoformExonAnnoation = "../../Kallisto/GCF_016699485.2_bGalGal1.mat.broiler.GRCg7b_genomic.gtf",
+   isoformNtFasta       = "../../Kallisto/Gallus_gallus.bGalGal1.mat.broiler.GRCg7b.cdna.all.fa",
+   fixStringTieAnnotationProblem = TRUE,
+   showProgress = FALSE
+ )
Step 1 of 7: Checking data...
Step 2 of 7: Obtaining annotation...
    importing GTF (this may take a while)...
Error in importRdata(isoformCountMatrix = Quant$counts, isoformRepExpression = Quant$abundance,  : 
  The annotation and quantification (count/abundance matrix and isoform annotation) seems to be different (Jaccard similarity < 0.925). 
Either isforoms found in the annotation are not quantifed or vise versa. 
Specifically:
 44937 isoforms were quantified.
 85704 isoforms are annotated.
 Only 0 overlap.
 44937 isoforms quantifed had no corresponding annoation

This combination cannot be analyzed since it will cause discrepencies between quantification and annotation thereby skewing all analysis.

If there is no overlap (as in zero or close) there are two options:
 1) The files do not fit together (e.g. different databases, versions, etc) (no fix except using propperly paired files).
 2) It is somthing to do with how the isoform ids are stored in the different files. This problem might be solvable using some of the 'ignoreAfterBar', 'ignoreAfterSpace' or 'ignoreAfterPeriod' arguments.
     Examples from expression matrix are : ENSGALT00010036134.1, ENSGALT00010012273.1, ENSGALT00010030305.1 
     Examples of annoation are : XM_004943259.5, XM_040701218.2, XM_003642751.6 
     Examples of isoforms which were only found im the quantification are  : ENSGALT00010064149.1, ENSGALT00010061619.1, ENSGALT00010039755.1 

If there is a large overlap but still far from complete there are 3 possibilites:
 1) The files do not fit together (e.g different databases versions etc.) (no fix except using propperly paired files).
 2) If you are using Ensembl data you have supplied the GTF without phaplotyps. You need to supply the <Ensembl_version>.chr_patch_hapl_scaff.gtf file - NOT the <Ensembl_version>.chr.gtf
 3) One file could contain non-chanonical chromosomes while the other do not (might be solved using the 'removeNonConvensionalChr' argument.)
 4) It is somthing to do with how a subset of the isoform ids are stored in the different files. This problem might be solvable using some of the 'ignoreAfterBar', 'ignoreAfterSpace' or 'ignoreAfterPeriod' arguments.

Genome annotation • 449 views

ADD COMMENT • link updated 21 months ago by rpolicastro 13k • written 21 months ago by Saeed • 0