Get part of a sequence from a .bam file, considering the reference genome position.
1
0
Entering edit mode
7.2 years ago
valopes ▴ 30

Hi everybody.

I am trying to get part of a sequence from a .bam file, considering the reference genome position. Like, which sequence from the .bam file returns at position 50,000 to 50,400 nucleotides... Can someone help me? I need all in details, since I never work with this.

Assembly • 2.8k views
ADD COMMENT
0
Entering edit mode

I am not sure exactly what you are trying to do but maybe this would help

samtools mpileup -C50 -gf ref.fasta -r chr3:1,000-2,000 in1.bam

http://www.htslib.org/doc/samtools.html

ADD REPLY
0
Entering edit mode

Thank you very much. Can I make you more questions? So the ony thing that I need to change in this command is "ref.fasta" for my genome.fata and "in1.bam" for my.bam? And of course the coordinates...

ADD REPLY
0
Entering edit mode

I did like this and I got this message

[bam_parse_region] fail to determine the sequence name. [mpileup] malformatted region or wrong seqname for US-18.bam

ADD REPLY
0
Entering edit mode

You will have to run following command to create the indexes first.

samtools faidx ref.fasta
ADD REPLY
2
Entering edit mode
7.2 years ago

samtools documentation (i.e. http://www.htslib.org/doc/samtools.html) recomends to use sorted bam files for many tools like "view", "flagstat", "bedcov" and other. Also sorted bam files are a bit smaller and other programs work faster with them. So I usually sort and index bam files. Then I use samtools view to see the region of interest:

samtools sort -T tmp.in1 -o in1.sorted.bam in1.bam
samtools index in1.sorted.bam
samtools view in1.sorted.bam chr3:50000-50400

Make sure your prefix for the temporary files (option -T in samtools sort) is different for different bam files if you sort multiple files in parallel otherwise your sorted files will contain corrupted data.

ADD COMMENT

Login before adding your answer.

Traffic: 2581 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6