Retrieve Pairs Sequences By Id Form Sam Or Bam File
2
1
Entering edit mode
11.8 years ago
Mchimich ▴ 320

Hello everybody, I'm quite new in bioinformatics and perl programming and I need some help. My problem is : I have a sam/bam file resulting form a mapping of paired-ends reads into a reference sequence. And I want to extract all the reads overlapping a specific region. I'm using samtools view for this purpose but I get only the reads (paired or not) covering this region. Now I want to get the opposite reads (If I have only one reads in this region) even if they are mapped into distant location ! I use bio::db::fasta bioperl module to retrieve sequence from my original file (fasta containing all the paired-ends reads) but it's too long especially the indexing of the database (6G). Can someone know if there is a simplest way to do that ? Thanks in advance and sorry for having gone on so long !

sam bam mapping • 4.0k views
ADD COMMENT
0
Entering edit mode
11.8 years ago
Ryan Thompson ★ 3.6k

Why do you need to retrieve the sequence from the original fasta file? Isn't each read sequence stored in the bam file?

If the pair information is at the end of the sequence identifiers, and you sort your bam file lexically by the sequence IDs, you should end up with a bam file where each mapped pair is consecutive in the bam file. Then you can go through the file sequentially and decide what to do with each pair.

ADD COMMENT
0
Entering edit mode
11.8 years ago

I don't know of a way to extract both reads with one query.

One possible alternative would be to look at column 8 of a read retrieved with your samtools query that indicates the position of the mate and perform a new samtools query using that. Since you can chain multiple queries together you would not need to invoke samtools for each of these separately but you could group them together.

ADD COMMENT

Login before adding your answer.

Traffic: 2863 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6