Question

How to calculate percentage of chimeras from r1/r2 files

1

Entering edit mode

23 months ago

eli_bayat ▴ 90

hello,

I have r1 and r2 fastq files, a WT reference sequence and a variant reference sequence. I want to find the %WT reads that are chimers. From my understanding, I would need to looking into r1/r2 files and see if half of WT sequence is shared with parts of a variant sequence.

Is there a tool that does this, giving the two reference file (WT and variant sequence) and r1/r2 reads?

PS: I found usearch and also BWA has a chimera flag. When I tried both I got dramatically different results. BWA only found 1 chimera while usearch listed 5. Also, the on chimera found in common was made up of different parts. They both shared the same right piece but the left part was different. I think this could be because the reference sequences are fairly similar. Is this expected behavior? Since they are different how do I determine which one to use?

Any suggestions is appreciated.

paired-end-reads chimeric sequences usearch chimera • 1.1k views

ADD COMMENT • link updated 22 months ago by Asaf 10k • written 23 months ago by eli_bayat ▴ 90

0

Entering edit mode

Could you elaborate on the type on the type of sequencing done (DNA, RNA, insert size, kit etc.) and why you are looking for chimeras? Is it for quality control or is it part of your experiment? Are you expecting R1 or R2 to be a chimera or just R1 maps to one region and R2 maps to another region in the genome?

ADD REPLY • link 23 months ago by Asaf 10k

0

Entering edit mode

It is part of the experiment that I was asked to provide, mostly to see what percentage of WT hybrid with other variant sequences. both R1 and R2 map to the same region. I would think I probably can use only r1 reads to figure that out.

ADD REPLY • link 23 months ago by eli_bayat ▴ 90

1

Entering edit mode

I did something similar in the past: https://github.com/asafpr/RILseq not sure if it's the best way or if newer tools are available but it works pretty well. It's basically cutting the read in two, mapping each side with bwa and looking for chimeras, there's a statistical step to highlight chimeras that appear more than expected at random since we have a ligation step in the protocol.

ADD REPLY • link 23 months ago by Asaf 10k

0

Entering edit mode

Thanks for the info, when I tried the package I empty files. I aligned my sample to the reference using BWA and when I run map_chimeric_fragments.py my_refrence.fasta my_sample.bam I get a bunch of empty files in the folder remapped-data is this expected? I looked at map_chimeric_fragments.py and it says it only looks for chimeras in the unmapped sequences in that true?

ADD REPLY • link 22 months ago by eli_bayat ▴ 90

0

Entering edit mode

This script extracts unmapped reads or pairs that were mapped to other regions of the genome. You should get fastq files with those unmapped reads and then their mapping to the genome.

ADD REPLY • link 22 months ago by Asaf 10k