Hi everyone! I apologize in advance for my ignorance. I'm just beginning in in a new bioinformatics project. I have a BAM file generated from a fastq file from a whole genome sequencing (C. elegans). I want to find in which site (or sites) a construct has been integrated in the genome, comparing my sequence with a published genome. I've been analyzing the SNPs and INDELs with the IGV software using VCF files, but there are no tools to search for specific sequences in a sequenced genome VCF file. Because of this, I'm interested in generate a FASTA (or any other format) file from the BAM alignment file, to be able to blast the sequence of the construct against my whole genome sequence. Does anyone know any software, or another strategy to achieve my objective? I don't know if I explained well, I hope so. Thank you so much!
David
bam to fastq here
I'm not sure if your approach is the best for this. I would start with the fastq file and search your construct in there, and see where the rest of the 'positive' reads map.
Essentially you are looking for a big structural variant (an insertion), right?
Yes, that's correct. I'm looking for an insertion, a big one (and probably integrated more than once in the genome). Do you mean I should to convert my BAM file into fastq? Because the original fastq from the sequencing service need to be trimmed and aligned before, right?
Due to the huge insertion, it's possible that alignment wasn't that successful. I would suggest looking into structural variant calling tools.
That could be useful! thank you