extract reads ID for each region included in a bed file and create a separate file
1
0
Entering edit mode
2.1 years ago
pingu77 ▴ 20

Hi all,

I have a bam file and a bed file with several regions of interest to me. I would like to get the sequence IDs of the reads that fall in each region included in the bed file. I was able to get the sequence ID, chr and start coordinate for each read, using this command:

samtools view file.bam | cut -f1,3,4

An example of what I get in output:

readA   chr1    3985548  
readB   chr1    48349456 
readC   chr1    3985568 
readD   chr1    4932093

I would like to create for each region in the bed file a different txt file that includes the sequence IDs that map in a given region.

example of the top 5 lines included in my bed file:

chr1    3985538 3985593 chr1:3985538-3985593
chr1    3986091 3986158 chr1:3986091-3986158
chr1    4834946 4834972 chr1:4834946-4834972
chr1    4932073 4932105 chr1:4932073-4932105
chr1    7398192 7398238 chr1:7398192-7398238

example of the desired output:

Region: chr1:3985538-3985593
file -> chr1:3985538-3985593.txt
readA
readC

Region: chr1:4834946-4834972
file -> chr1:4834946-4834972.txt
readB

Region: chr1:4932073-4932105
file -> chr1:4932073-4932105.txt
readD

So how can I achieve the same? Any help is highly appreciated.

Thank you!

samtools bam bed • 749 views
ADD COMMENT
0
Entering edit mode

hi pingu77,

what have you tried/searched for so far? this question has been asked quite a number of times here already. Also some googling will point you to the right direction.

If you are stuck at any point you can ask here for specific help.

ADD REPLY
0
Entering edit mode

Hi! Yes, I tried to find an answer on google/forum. However, I couldn't find the answer to my problem. In the questions asked, people usually have a specific genomic region, in my case I have 60 000 regions in the same bed file and I would like to create 60 000 txt file reporting the read ID that map to the considered region..

ADD REPLY
0
Entering edit mode
ADD REPLY
3
Entering edit mode
2.1 years ago

in my case I have 60 000 regions in the same bed

awk '{printf("%s:%d-%s\n",$1,int($2)+1,$3);}' in.bed | while read R
do
    #blablabla
done
ADD COMMENT
0
Entering edit mode

Thanks a lot Pierre Lindenbaum ! it is exactly what I needed. My solution based on your hint:

awk '{printf("%s:%d-%s\n",$1,int($2)+1,$3);}' file.bed | while read R; 
do samtools view file.bam $R | cut -f1 > $R'.txt' 
done
ADD REPLY

Login before adding your answer.

Traffic: 2828 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6