How to extract a region from a reference sequence which does not align with a given query read?
2
1
Entering edit mode
5.1 years ago

I have a query sequence with 1935 bp and a reference sequence with 2817 bp. My query aligns fully with the reference sequence. I need to extract the regions from the reference sequence(2817bp) which are not a part of my query i.e.approximately 882 bp. I need to do this for many such files (nearly 200). Please help me with a script if possible. I couldn't find any tools online for the same. Kindly help!

sequence alignment next-gen sequencing genome • 3.7k views
3
Entering edit mode

Look at the samtools faidx solution: Extract User Defined Region From An Fasta File

0
Entering edit mode

what if i do not know the coordinate regions? I mean as I have mentioned I have to perform this nearly 200 times so it will be difficult to check the coordinates of the reference file which do not match with my query each time. Can you help me with a solution which will directly extract the unaligned region from the reference from a SAM/BAM file?

1
Entering edit mode

extract the region with 0 coverage bedtools: extracting no coverage regions

0
Entering edit mode

Can you please explain in detail? I am a beginner.

0
Entering edit mode

The link that Pierre provided gives the exact command. What additional explanation do you require?

2
Entering edit mode
3.7 years ago
Karma ▴ 310

Try the following

samtools faidx reference.fasta

samtools faidx reference.fasta chr:1-n