I have exon capture data for 10 genes. The exon capture and amplification was done by Agilent SureSelect XT capture kit. The amplicons were sequenced on NextSeq 2X150 bp. I wish to analyze the exonic variants for the 10 genes. The protocol which I have is: 1) Trim data to remove adapters and low quality base call
2) Assemble the forward and reverse read for each amplicon to generate a consensus. If a sequence variant is found between forward and reverse reads, discard both the reads.
3) Map the reads to human genome, calculate coverage per exon of the target genes. 4) Perform variant analysis.
- What approach should I use to perform second step? I read about BBMap reformat.sh can be used for interleaving and de-inteleaving reads. Is that a correct approach?
- What overlapping bases number should i use,because after trimming the read length can be from 50bp to 150bp?
- Will this step lead to loss of data and is it necessary to be performed?
Looking forward to some answers.