Question

Create mutation catalogue function giving GRange object errors

0

Entering edit mode

4.1 years ago

kittys • 0

Hi, I am trying to analyse indel mutations using the R package YAPSA. My data is in the input format required, with columns chrom, pos, ref, alt and patient ID. I am trying to use the function create_mutation_catalogue_from_df with this data using the GRCh38 of the genome. My command is:

indel_catalogue <- create_mutation_from_df(indels_df, this_seqnames.field = "chrom", this_start.field = "pos", this_end.field = "pos", this_PID.field = "patient_ID", this_refGenome = BSgenome.Hsapiens.NCBI.GRCh38, this_verbose = 1)

The error I get is: Error in validObject(.Object) : invalid class "GRanges" object : 'seqnames(x)' contains missing values

I also tried adding: this_refGenome_Seqinfo = seq_info_GRCh38 where seq_info_GRCh38 <- SeqinfoForBSGenome(BSgenome.Hapsiens.NCBI.GRCh38)

I tried the translate_to_hg19 function before running the command with no success either.

I also tried this with the BSgenome.Hsapiens.UCSC.hg38 genome which returns the error error in ans[] <- x : replacement has length zero

Any ideas on how to fix this? Any suggestions would be hugely appreciated!! Thank you :)

YAPSA GenomicRanges R indels • 934 views

ADD COMMENT • link updated 22 months ago by Marc • 0 • written 4.1 years ago by kittys • 0

score 0 · Answer 1 · 2022-06-15

Hi, which version of YAPSA are you using? The current Bioconductor version of YAPSA creates mutational catalogues from INDELs with the function create_indel_mutation_catalogue_from_df. As far as I can see, the sequence context is collected from BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19 and cannot currently be changed (a separate feature request at https://support.bioconductor.org/tag/YAPSA/) could bring attention to this. I generally recommend to rename the columns to "CHROM", "POS", "REF", "ALT" and "PID", because some of the YAPSA functions expect these names. The following example illustrates the creation of a mutational catalog from INDELs:

> suppressPackageStartupMessages(library("YAPSA"))
> data(sigs_pcawg)
> data(GenomeOfNl_raw) ## only required for example
> 
> INDEL_df <- translate_to_hg19( GenomeOfNl_raw[1:200,], "CHROM" )[,c("CHROM","POS","REF","ALT")]
> INDEL_df$PID <- c("PID1","PID2","PID3","PID4","PID5")
> head(INDEL_df)
   CHROM   POS   REF ALT  PID
32  chr2 12320   AAT   A PID1
53  chr2 13613    AG   A PID2
92  chr2 17012 TAAAG   T PID3
94  chr2 17128     T  TG PID4
95  chr2 17142     G  GA PID5
96  chr2 17151  TCAC   T PID1
> INDEL_mc <-
+   create_indel_mutation_catalogue_from_df(
+     in_dat = INDEL_df,
+     in_signature_df = PCAWG_SP_ID_sigs_df )
[1] "Indel sequence context attribution of total  200  indels. This could take a while..."
[1] "INDEL classification of total  200  INDELs This could take a while..."
> head(INDEL_mc)
           PID1 PID2 PID3 PID4 PID5
DEL_C_1_0     1    4    1    2    2
DEL_C_1_1     2    2    2    2    0
DEL_C_1_2     0    0    0    1    0
DEL_C_1_3     0    0    1    0    0
DEL_C_1_4     0    1    1    1    0
DEL_C_1_5+    0    0    0    1    0

I encourage you to post any further errors, questions or requests at https://support.bioconductor.org/tag/YAPSA/.