Question: Retrieving multiple regions (from a list) in a bam file and outputing as a table
0
gravatar for dodausp
8 months ago by
dodausp140
Denmark/Copenhagen/BRIC
dodausp140 wrote:

Hi there! First of all, I hope all are safe and taking the necessary precautions amid this covid-19 situation. Now, to my question, say I have a list (or a data frame) file with regions of interest like this:

  Chromosome     start       end
1       chr1   8827004   8827026
2       chr1   9126223   9126245
3       chr1 105687304 105687326
4       chr1 111900474 111900496
5       chr2  14439964  14439986
6       chr2 126159675 126159697
7       chr3   2429724   2429746
8       chr3  11541187  11541209
9       chr3 173421484 173421506

How can I extract those sequences from a bam file and output them in a single file (i.e. a data frame or list)?

Optimally, the final result should look like this:

  Chromosome     start       end                sequence
1       chr1   8827004   8827026  aaatgctagctagctagctagt
2       chr1   9126223   9126245 tttaagcgcgatagctagctagt
3       chr1 105687304 105687326 gagagagatcgatcgatcgatgc
4       chr1 111900474 111900496 gcatcgatgcatgcatgcatggg
5       chr2  14439964  14439986 gggctagctagctagcatatatt
6       chr2 126159675 126159697 aaaatttttcttcgagatcgcta
7       chr3   2429724   2429746 tatatatatcgcgatgctagcag
8       chr3  11541187  11541209 gggtagataccccctatacttta
9       chr3 173421484 173421506 ggcatgctagcctacgatcatcg

But if generating a separate file just for the output sequences, that is also ok. I have tried using samtools, such as below:

samtools view input.bam chr1:9126223-9126245 > output.bam

But it doesn't work. For one thing, the sequences are too short, so samtool view doesn't catch anything. And second, this command retrieves all mapped sequences that falls within the specified region. I am quite sure that I am not on the right track here, so if anybody could help, that would be really good.

Thank you and again, I hope you are all safe and doing well.

snp wes data frame • 186 views
ADD COMMENTlink modified 8 months ago • written 8 months ago by dodausp140

do you want to extract the reads or the sequence of the reference genome at these regions ?

ADD REPLYlink written 8 months ago by Nicolas Rosewick9.2k

I would like to extract the sequence of the bam file; it is, the consensus sequence of the given bam file.

ADD REPLYlink written 8 months ago by dodausp140

Question already answered then : Generate consensus from BAM file

ADD REPLYlink written 8 months ago by Nicolas Rosewick9.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1851 users visited in the last hour