How to extract a region from a reference sequence which does not align with a given query read?
2
1
Entering edit mode
5.1 years ago

I have a query sequence with 1935 bp and a reference sequence with 2817 bp. My query aligns fully with the reference sequence. I need to extract the regions from the reference sequence(2817bp) which are not a part of my query i.e.approximately 882 bp. I need to do this for many such files (nearly 200). Please help me with a script if possible. I couldn't find any tools online for the same. Kindly help!

sequence alignment next-gen sequencing genome • 3.7k views
ADD COMMENT
3
Entering edit mode

Look at the samtools faidx solution: Extract User Defined Region From An Fasta File

ADD REPLY
0
Entering edit mode

what if i do not know the coordinate regions? I mean as I have mentioned I have to perform this nearly 200 times so it will be difficult to check the coordinates of the reference file which do not match with my query each time. Can you help me with a solution which will directly extract the unaligned region from the reference from a SAM/BAM file?

ADD REPLY
1
Entering edit mode

extract the region with 0 coverage bedtools: extracting no coverage regions

ADD REPLY
0
Entering edit mode

Can you please explain in detail? I am a beginner.

ADD REPLY
0
Entering edit mode

The link that Pierre provided gives the exact command. What additional explanation do you require?

ADD REPLY
2
Entering edit mode
3.7 years ago
Karma ▴ 310

Try the following

samtools faidx reference.fasta

samtools faidx reference.fasta chr:1-n
ADD COMMENT

Login before adding your answer.

Traffic: 1566 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6