Modifying fastq base at specific reference location on different length reads
1
0
Entering edit mode
5.8 years ago
yryan ▴ 10

Hi folks,

I'm interested in using oxford nanopore's taiyaki tool in order to train a new basecaller for modified bases at a known position. In order to train a new model basecaller I need to modify the fastq (or sam and convert back) for each fast5 file in order to signify this modified base. However I have around 10k reads, combined with minion's inherent error rate it's not really something I can edit in a regex way as far as I know.

Does anyone know of a method or script that can use a sam file aligned to a consensus where I can modify the base at a specific location which would get around the previous issues?

alignment next-gen sequence nanopore • 2.3k views
ADD COMMENT
1
Entering edit mode
5.8 years ago

or script that can use a sam file aligned to a consensus where I can modify the base at a specific location

see How to introduce artificial mutation in bam

ADD COMMENT
0
Entering edit mode

that looks like just the thing, thanks!

ADD REPLY
0
Entering edit mode

please flag the question as answered if it fulfills your needs (green tick on the left)

ADD REPLY
0
Entering edit mode

I was wondering if I could get a bit more help... When I run the command

java -jar /bioinformatics_tools/jvarkit/dist/biostar404363.jar -o modified.bam -p basecalled.vcf original.bam

The output is only partially converting all of my T's to N's for the first 30 or so entries, and the remainder (~6k) are not changing, even with no AF ratio in the VCF (below) which I'd assumed would convert all T's to N's?

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##samtoolsVersion=1.9+htslib-1.9
##samtoolsCommand=samtools mpileup -v -f reads.fasta basecalled/basedcalled_sorted.bam
##reference=file://reads.fasta
##contig=<ID=X,length=6000>
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency among genotypes, for each ALT allele, in the same order as listed">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
X   4605    .   T   N   .   .   .

Using the samtools -tview command in the link only a small proportion are being converted to N's, and these are the reads at the end of the terminal output, all of those at the beginning are unchanged. Is there anything I can do to alter this?

Also I realise this may be a bit much to ask but would it be possible to allow for the use of non cannonical bases, say Y in this workflow as this would be a very useful tool in order to create a training set for nanopore basecalling for novel modifications.

ADD REPLY
0
Entering edit mode

hard to answer without seeing the BAM and the VCF. Please use https://github.com/lindenb/jvarkit/issues , narrow the bam around the position please.

ADD REPLY

Login before adding your answer.

Traffic: 4125 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6