Hello, I'm analyzing data with NMD genes knocked out, so I'm expecting novel transcripts. To analyse them I was trying to use switchAnalyzeRlist package. I've been trying to run addORFfromGTF but unfortunately I ran into the following error "No ORFs could be added to the switchAnalyzeRlist. Please ensure GTF file have CDS info (and that isoform ids match)." 0) I trimmed adaptors 1) I ran STAR to align the reads in 4 samples.
for i in "${files_id[@]}"; \
do \
$programs/STAR-2.7.10a/bin/Linux_x86_64_static/STAR \
--runThreadN 7 \
--genomeDir $human_genome/indexes \
--readFilesIn $hek/trimmed/${i}_1.fastq $hek/trimmed/${i}_2.fastq \
--outSAMtype BAM SortedByCoordinate \
--outFileNamePrefix $hek/bam/$i/$i; \
done;
2) I made annotation with StringTie in 3 steps: 2.1) I ran each sample without any reference annotation
for i in "${files_id[@]}"; \
do \
$programs/stringtie-2.2.1.Linux_x86_64/stringtie \
-v -p 15 \
-o $hek/gtf/${i}.gtf \
$hek/bam/$i/${i}Aligned.sortedByCoord.out.bam; \
done;
2.2) I merged all transcripts providing reference annotation (-G)
$programs/stringtie-2.2.1.Linux_x86_64/stringtie \
--merge \
-v $hek/gtf/1.gtf \
$hek/gtf/2.gtf \
$hek/gtf/3.gtf \
$hek/gtf/4.gtf \
-G $human_genome/Homo_sapiens.GRCh37.87.gtf \
-o $hek/gtf/merged_annotation.gtf
2.3) than I ran StringTie again for each sample with providing merged arrotation as -G and forbidding new transcripts this time (-e) and created .ctab files
for i in "${files_id[@]}"; \
do \
$programs/stringtie-2.2.1.Linux_x86_64/stringtie \
-v -p 300 \
-e \
-b $hek/ctab/${i}.ctab \
-G $hek/gtf/merged_annotation.gtf \
-o $hek/gtf/${i}_merged_annotation.gtf \
$hek/bam/$i/${i}Aligned.sortedByCoord.out.bam; \
done;
3) Created a transcript file out of the merged annotation 4) I successfully created switchAnalyzeRlist (giving isoformExonAnnoation merged annotation as an input)
TieQuant <- importIsoformExpression(
parentDir=paste(path,"/ctab", sep=""), readLength=75
)
myDesign <- data.frame(
sampleID = colnames(TieQuant$abundance)[-1],
condition = c('1', '1', '2', '3')
)
**aSwitchList <- importRdata(
isoformCountMatrix = TieQuant$counts,
isoformRepExpression = TieQuant$abundance,
designMatrix = myDesign,
isoformExonAnnoation="f/merged_annotation.gtf",
isoformNtFasta ="transcripts.fasta",
fixStringTieAnnotationProblem = TRUE,
estimateDifferentialGeneRange = FALSE,
showProgress = TRUE
)**
Step 1 of 7: Checking data... Warning: The experimental design seems to be of very low complexity - very few samples per replicate. Please check the supplied design matrixt to make sure no mistakes were made.Warning: !!! NB !!! NB !!! NB !!!NB !!! NB !!! IsoformSwitchAnalyzeR is not made to work with conditions without indepdendet biological replicates and results will not be trustworthy! At best data without replicates should be analyzed as a pilot study before investing in more replicates. Plase consult the "Analysing experiments without replicates" and "What constitute an independent biological replicate?" sections of the vignette. !!! NB !!! NB !!! NB !!!NB !!! NB !!! Step 2 of 7: Obtaining annotation... importing GTF (this may take a while)... Warning: We found 921 (0.44%) unstranded transcripts. These were removed as unstranded transcripts cannot be analysedWarning: No CDS annotation was found in the GTF files meaning ORFs could not be annotated. (But ORFs can still be predicted with the analyzeORF() function) 73110 ( 35.26%) isoforms were removed since they were not expressed in any samples. Step 3 of 7: Fixing StringTie gene annoation problems... 8662 isoforms were assigned the ref_gene_id and gene_name of their associated gene_id. This was only done when the parent gene_id were associated with a single ref_gene_id/gene_name. 2866 isoforms were assigned the ref_gene_id and gene_name of the most similar annotated isoform (defined via overlap in genomic exon coordinates). This was only done if the overlap met the requriements indicated by the three fixStringTieViaOverlap* arguments. We were unable to assign 36 isoforms (located within annotated genes) to a known ref_gene_id/gene_name. These were removed to enable analysis of the rest of the isoform from within the merged genes. 1794 gene_ids which were associated with multiple ref_gene_id/gene_names were split into mutliple genes via their ref_gene_id/gene_names. 35256 genes_id were assigned their original gene_id instead of the StringTie gene_id. This was only done when it could be done unambiguous. Step 4 of 7: Calculating gene expression and isoform fractions... Step 5 of 7: Merging gene and isoform expression... |=====================================================================================================================================================| 100% Step 6 of 7: Making comparisons... |=====================================================================================================================================================| 100% Step 7 of 7: Making switchAnalyzeRlist object... Done
ORF_known = addORFfromGTF(switchAnalyzeRlist = aSwitchList,
pathToGTF = "merged_annotation.gtf")
Error in addORFfromGTF(switchAnalyzeRlist = aSwitchList, pathToGTF = "merged_annotation.gtf") : No ORFs could be added to the switchAnalyzeRlist. Please ensure GTF file have CDS info (and that isoform ids match).