Off topic:Extract SV indicating pair-end reads
0
0
Entering edit mode
4.1 years ago
ncxhit • 0

Hi everyone, I am new to bioinformatics and I was reading a review article on how to find structure variation using pair-end sequencing. While I understand what it means fine (if there is a deletion the distance between two read pair would decline and so on) I am struggling to write a program to find these "SV indicating" reads. Here's what I have done.

  1. align my reads to the reference genome using BWA-mem
  2. Using the following python code to find read pairs with either distance too big/small or hanging pair or alignment quality too small or bwa thinks it's not a proper match.
if (np.abs(read1_pos - read_mate_pos) > read_distance_big_threshold):
    output_pair_end_fastq(read, read_mate)
    # print("Distance Too Big")
elif read.mate_is_unmapped:
    output_pair_end_fastq(read, read_mate)
    # print("Hanging Pair")
elif (read.is_reverse == read.mate_is_reverse):
    output_pair_end_fastq(read, read_mate)
    # print("Orientation!")
elif np.abs(read_mate_pos - read1_pos) < read_distance_small_threshold:
    output_pair_end_fastq(read, read_mate)
    # print("Distance Too Small")
elif not read.is_proper_pair:
    output_pair_end_fastq(read, read_mate)
    # print("Not Proper Pair")
elif read.mapping_quality < read_mapping_quality_threshold:
    output_pair_end_fastq(read, read_mate)
    # print("Quality Too Low")
  

The normal distance of my pair end reads is 500bp and the read_distance_big_threshold is 600 while read_distance_small_threshold is 400.

My problem is I am having tons of false positives meaning I have found a lot of "non SV indicating reads" in my way. So my question is "Is there any existing tools that can do this kind of thing for me? "

Specificly I am looking for a tool to find all the reads indicated in this graph: from page 368 of that review

Structure Variation pair-end sequencing • 418 views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 1703 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6