Get all the nucleotides from a bam file that is aligning to specific position in the genome
3
0
Entering edit mode
3.2 years ago
rsafavi ▴ 60

Hello,

I have an alignment bam file, and I would like to retrieve all the bases aligning to that specific location in the genome. For example, if I am interested in position X in the genome, I would like to extract all the nucleotides that are aligning to that position X from my alignment file

samtools pysam genome position alignment • 3.1k views
0
Entering edit mode

Hello rsafavi,

why do you like to do this and how should your desired output look like?

The procedure would be to get all reads that overlap the desired position using samtools view. Due to the mapping position given for each read, one can than find the base on the location with respect to the CIGAR value.

fin swimmer

2
Entering edit mode
3.2 years ago
Tm ★ 1.0k

If you mean, you want to extract read base which is aligning to particular position of your genome, then you can try samtools mpileup or gatk Pileup. You can explore more about mpileup format here. Samtools mpileup will give you output in form of 5 columns :

• Sequence identifier
• Position in sequence (starting from 1)
• Reference nucleotide at that position
• Number of aligned reads covering that position (depth of coverage)
• Bases at that position from aligned reads
• quality of those bases (OPTIONAL)

From this you can idea about the base present in reference genome and nucleotide of the reads mapping to that particular position.

If you have just handful of positions to check then you can also view your alignment in genome viewer like IGV or Tablet to visualise the bases at that particular position.

0
Entering edit mode

I am working with Nanopore cDNA data. I tried samtool mpileup before like this:

as a test case, but I got:

chr1 13625035 N 0 * *

chr1 13625036 N 0 * *

which is not correct, since when I look at the genome browser I see two reads aligning to that region

0
Entering edit mode

Its strange. And why 3rd column which represents reference base is "N"?. What is the actual reference nucleotide in that position? I suggest you to give it a try again considering the broad region (~5000k).

1
Entering edit mode
3.2 years ago
rsafavi ▴ 60

I figured it out. I had to put -Q 0 ( base quality) for Nanopore data

0
Entering edit mode
13 months ago

Posted an answer here: A: Extract Base Based On Position From Bam File

Kevin