Question: Get part of sequence from genome, given a start and stop position with Java.
0
gravatar for ahclugtenberg
4 months ago by
Netherlands
ahclugtenberg0 wrote:

I've got VCF-like files with start, stop, REF and ALT columns. I need to check that the REF position from the variants are the same as the one in the genome, to check if they're from the same built. I also need the surrounding nucleotides of the given position. Also, some of the REF columns are empty and because of this, it is not an appropriate VCF file.

I've got a fasta file which has the genome for chromosome 1, and I was wondering if there's a library available to get a part of the genome in nucleotides, given a start- and stop position. For example, if you've got the genome AACCGGTT, that given a start position of 1 and a stop position of 4 it returns AACC. I could write such a parser myself, but I'd rather use a library which has the edge-cases covered.

I'd rather have something locally than use the API of NCBI, which also makes this possible.

java vcf genome • 179 views
ADD COMMENTlink modified 13 days ago by Biostar ♦♦ 20 • written 4 months ago by ahclugtenberg0

Hi, You can use bedtools getfasta .

Best

ADD REPLYlink written 4 months ago by Titus900

samtools faidx, pyfaidx, bedtools getfasta can all retrieve parts of fasta sequence given a start and stop. While not libraries they may be an option to consider.

@Pierre has his Javarkit which may have something that will work (if you must use Java): http://lindenb.github.io/jvarkit/

ADD REPLYlink modified 4 months ago • written 4 months ago by genomax71k

If it's anything like BioPython and you absolutely must use Java, there's no doubt something in BioJava which you could use.

I know less than nothing about Java specifically though so can't offer any practical code for this.

ADD REPLYlink written 4 months ago by Joe14k
1
gravatar for Pierre Lindenbaum
4 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum123k wrote:

use the htsjdk library and the class IndexedFastaSequenceFile https://samtools.github.io/htsjdk/javadoc/htsjdk/htsjdk/samtools/reference/IndexedFastaSequenceFile.html

(...)
faidx =new IndexedFastaSequenceFile(fastaFile);
sub = faidx.getSubsequenceAt("chr1",10,20).getBaseString();
(...)
ADD COMMENTlink modified 4 months ago • written 4 months ago by Pierre Lindenbaum123k

Yes, thank you! I was just looking at this library, but couldn't find the right function.

ADD REPLYlink written 4 months ago by ahclugtenberg0

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.

Upvote|Bookmark|Accept

ADD REPLYlink written 4 months ago by Pierre Lindenbaum123k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1934 users visited in the last hour