How to find all mapped locations of a read from PySam?
0
0
Entering edit mode
3.7 years ago
O.rka ▴ 710

I have a mapping with an insane amount of multimapped reads. I'm trying to figure out all the locations these multimapped reads have mapped to but I'm having difficult accessing all of the locations from the bam file. I got my BAM file from STAR aligner.

Here's my STAR command:

conda activate star_env

STAR --genomeDir star_index_genome_ercc92_virus --readFilesCommand zcat --readFilesIn preprocessing/kneaddata_repaired_1.fastq.gz preprocessing/kneaddata_repaired_2.fastq.gz --outFileNamePrefix intermediate/star_output/star_ --runThreadN 2 --outReadsUnmapped Fastx

samtools view -h -b intermediate/star_output/star_Aligned.out.sam > intermediate/star_output/star_Aligned.out.bam

I'm iterating through my bam file but I can't find where all of the multimapping locations are?

for fp in sy.pv(sizes.index, "BAM Files"):
    bamfile = pysam.AlignmentFile(fp, "rb")
    for record in bamfile:
        number_of_loci = record.get_tag("NH")
        if number_of_loci > 1:
            break

Here's the results:

record.get_tags()
#[('NH', 6), ('HI', 1), ('AS', 299), ('nM', 0)]

record.reference_name, record.reference_start, record.reference_end
#('000181F_arrow', 3618, 3769)

Why isn't record.reference_start and record.reference_end a tuple of 6 positions?

alignment • 1.6k views
ADD COMMENT
0
Entering edit mode

why not using a simple samtools, awk ? Like samtools view in.sorted.bam | awk '{ if ($5<255) print }' | cut -f3,4

ADD REPLY
0
Entering edit mode

I'm trying to do a complicated analysis based on read sizes and multimapping and I can't really do that in a bash shell. Doesn't the command above just output a single positoin

ADD REPLY

Login before adding your answer.

Traffic: 2885 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6