How to calculate percentage of chimeras from r1/r2 files
0
1
Entering edit mode
6 weeks ago
eli_bayat ▴ 80

hello,

I have r1 and r2 fastq files, a WT reference sequence and a variant reference sequence. I want to find the %WT reads that are chimers. From my understanding, I would need to looking into r1/r2 files and see if half of WT sequence is shared with parts of a variant sequence.

Is there a tool that does this, giving the two reference file (WT and variant sequence) and r1/r2 reads?

PS: I found usearch and also BWA has a chimera flag. When I tried both I got dramatically different results. BWA only found 1 chimera while usearch listed 5. Also, the on chimera found in common was made up of different parts. They both shared the same right piece but the left part was different. I think this could be because the reference sequences are fairly similar. Is this expected behavior? Since they are different how do I determine which one to use?

Any suggestions is appreciated.

ADD COMMENT
0
Entering edit mode

Could you elaborate on the type on the type of sequencing done (DNA, RNA, insert size, kit etc.) and why you are looking for chimeras? Is it for quality control or is it part of your experiment? Are you expecting R1 or R2 to be a chimera or just R1 maps to one region and R2 maps to another region in the genome?

ADD REPLY
0
Entering edit mode

It is part of the experiment that I was asked to provide, mostly to see what percentage of WT hybrid with other variant sequences. both R1 and R2 map to the same region. I would think I probably can use only r1 reads to figure that out.

ADD REPLY
1
Entering edit mode

I did something similar in the past: https://github.com/asafpr/RILseq not sure if it's the best way or if newer tools are available but it works pretty well. It's basically cutting the read in two, mapping each side with bwa and looking for chimeras, there's a statistical step to highlight chimeras that appear more than expected at random since we have a ligation step in the protocol.

ADD REPLY
0
Entering edit mode

Thanks for the info, when I tried the package I empty files. I aligned my sample to the reference using BWA and when I run map_chimeric_fragments.py my_refrence.fasta my_sample.bam I get a bunch of empty files in the folder remapped-data is this expected? I looked at map_chimeric_fragments.py and it says it only looks for chimeras in the unmapped sequences in that true?

ADD REPLY
0
Entering edit mode

This script extracts unmapped reads or pairs that were mapped to other regions of the genome. You should get fastq files with those unmapped reads and then their mapping to the genome.

ADD REPLY

Login before adding your answer.

Traffic: 1792 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6