retrieve amplicon sequences from fna file
1
0
Entering edit mode
3 months ago
Lila M ★ 1.2k

Hi all,

I would like to get the amplicon sequence from a indexed fna file. I do have the start/end coordinates for the amplicons and also the primer sequences. I've been reading about samtools faidx but it is not very clear to me how to do it. Does anyone here have any experience in this particular issue?

Thank you!

amplicon fna primers • 661 views
0
Entering edit mode

samtools faidx simply fetches sequences using coordinates. It can't find sequences in a file that match a certain sequence. You will need to first align the data and get the coordinates you need before fetching them.

If I am mistaken then perhaps provide some additional detail about what you have.

3
Entering edit mode
3 months ago

You say you have the coordinates, then you would need to do samtools faidx <file> <CONTIG_NAME>:<start>-<end>. All < > blocks should be replaced with the information you have.

Next time you want to ask a question try to identify what it is that you don't understand. This will help you find better results and it will help potential answerers give you precise solutions.

Two extra tips:

• Find code examples on GitHub with advanced search you can filter shell scripts that contain your command of interest. See https://github.com/search?l=Shell&q=%22samtools+faidx%22&type=Code for examples relevant to your inquiry.
• I'm not sure how you came about the coordinates for your amplicon this time so I might be mistaken to suggest additional software and instead you could check the program you are already using. Most software that can find the locations could also extract the amplicon. Consider using something like seqkit amplicon instead.
0
Entering edit mode

Thank you so much for your feedback.

The main issue I do have is that I am not sure if this tool is the correct one to solve my problem, or if there is any other better approach.

As I said, I have the human fna and fna.fai files. There is a variant there that I know where is it, and I do have the primers and the start/end information for the specific amplicon. What I want/need to do , is to extract the fasta sequence for that amplicon using the information I do have.

I'm not sure if this is more clear now (I hope!)

Thank you again

1
Entering edit mode

I have the human fna and fna.fai files. There is a variant there that I know where is it, and I do have the primers and the start/end information for the specific amplicon. What I want/need to do , is to extract the fasta sequence for that amplicon using the information I do have.

If you only have a fasta file then use seqkit amplicon (LINK) tool recommended bu @Juan above. Once you confirm that this does what you need I can move the @Juan's comment to an answer.

samtools faidx is not the correct tool.

0
Entering edit mode

Hello, well samtools seems to do the job!

samtools faidx hg38.fna  chrN:amplicon_start-amplicon_end


Why you think is not correct?

I am not able to use seqkit as it is not installed in the environment .... Thank you for your help :)

2
Entering edit mode

If you know the exact location coordinates of the amplicon then samtools faidx absolutely will work.

Since you were mentioning primers, we thought that the exact location was in doubt. For this case seqkit is the right tool.

0
Entering edit mode

This is great, thank you so much for the clarification!

1
Entering edit mode

I am not able to use seqkit as it is not installed in the environment ...

Then just install it, via conda install -c bioconda seqkit or just download the binary from the github release page.