Compare Differences in a sequence between a SAM file and a FASTA file
1
0
Entering edit mode
3.2 years ago

Forgive me. I'm probably about to sound quite stupid.

I'm trying to help my partner learn a bit of python for her studies. It's going ok but now I want to put it to practical use for her.

I noticed she was doing some work on a program called IGV (I think?) where she has a FASTA file and the BAM file and it created a graph and then she was going in individually at each point where a difference was highlighted between the sequences and noting down the C or GC percentage change between the her BAM results file and the FASTA ref file.

I thought to be myself, this must be doable in Python or something, and not its hooked me a bit because I can't find the answer...

I can import SAM/BAM etc using pysam and I can read the FASTA file using BioPython. But now that I'm having a bit of a play around and a google...I can see no where on the internet where this is even a thing. I'm wondering if something has been lost in translation between her and her teacher (English is not her first language)

I've asked her to email for some clarification, but what with Covid, its taking a while to get replies, so in the mean time I though I might post to see if someone can enlighten me if this is actually something or not :)

I thought I could just take the relevant part of the genome sequence, and the corresponding part of the BAM file and compare the string using loops etc and calculate with no issue, but it seems I can't create a sequence string when parsing the bam file because of all the overlaps etc.

Any ideas? Apologies if I haven't explained things the best, its almost 3rd hand information at the moment! Hoepfully I can get a better idea this evening :)

Thanks,

Lawrence

genome python samtools • 699 views
ADD COMMENT
0
Entering edit mode

Thanks for quick reply, I'm slowly getting there. I will try your key words "variant calling"... thanks again :)

ADD REPLY
0
Entering edit mode

Would recommend vcftools for general command line use, or if you really want to use python PyVCF is another option

ADD REPLY
1
Entering edit mode
3.2 years ago

The answer in these cases is usually not to try and make your own tool, but to find one that is already made. You're going to want to google "variant calling", and find tools that people have already made and tested. They won't be in python, you'll run the through the command line.

Worst case, you could upload your data to galaxy, and use one of their tools.

ADD COMMENT

Login before adding your answer.

Traffic: 2562 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6