Question

Circos input format

2

Entering edit mode

9.9 years ago

cdwilliam524 ▴ 30

Does anyone know how to convert *.fasta format contig data into a circos input data format? What tools do I need to use?

How could I find the start position and end position of the sequence?

The input karyotype file of circos usually takes inputs as name, label, start and end position and a color in order.

Any help or suggestion would be appreciated!

data-format-conversion alignment • 5.4k views

ADD COMMENT • link updated 2.6 years ago by Ram 43k • written 9.9 years ago by cdwilliam524 ▴ 30

0

Entering edit mode

Your first link does not work; can you please post a working link to your contig data?

ADD REPLY • link 9.9 years ago by Alex Reynolds 35k

0

Entering edit mode

Hey Alex,

The data works on my computer (just tried). Here is the link: http://www.ncbi.nlm.nih.gov/Traces/wgs/?val=AMCG01#contigs

click any FASTA link on the right to get the data.

Thanks a lot! I am a Computer Science person, new to Bioinformatics. Any advice would be appreciated!

ADD REPLY • link updated 2.6 years ago by Ram 43k • written 9.9 years ago by cdwilliam524 ▴ 30

Ram · Answer 1 · 2014-06-27

0

Entering edit mode

9.9 years ago

Alex Reynolds 35k

Perhaps use command-line tools like faToTwoBit to build an indexed reference genome of interest, and then command-line BLAT and your 2bit and FASTA files to query that reference genome. Where it can find matches, BLAT will yield hits that include chromosome name and start/stop positions, which you can parse into input to feed into Circos. These tools are part of the Kent Tools source code package.

ADD COMMENT • link updated 2.6 years ago by Ram 43k • written 9.9 years ago by Alex Reynolds 35k

0

Entering edit mode

I am currently trying tools like bowtie and bwa to index the reference genome, Candidatus Kuenenia stuttgartiensis; however, the output file .sam is not in its supposed result. I suppose to see actual genome information but I see mass code like NNNNNNNNNNNNCNNNNNNNNNNNNNNNNNN...

I don't think BLAT and BLAST have Candidatus Kuenenia stuttgartiensis (bacteria).

Thanks!

ADD REPLY • link 9.9 years ago by cdwilliam524 ▴ 30

0

Entering edit mode

The web version of BLAT does not, very probably, but you can definitely build your own index files and query against them with the command-line tools, if you have the reference genome somewhere.

ADD REPLY • link 9.9 years ago by Alex Reynolds 35k

0

Entering edit mode

Yes, I have the reference genome. How could I build my own index and query against it? Could you suggest me some tools and how to do it? What are the command-line tools? It is part of Xcode on Mac?

ADD REPLY • link 9.9 years ago by cdwilliam524 ▴ 30

0

Entering edit mode

See: http://genome.ucsc.edu/FAQ/FAQblat.html#blat3

You will need a compiler installed to build these tools. If you are using OS X, you will need to install Xcode and then install the command-line tools via that app. Then you can download the Kent Tools source code and compile it to get blat and faToTwoBit and other tools.

You would take your FASTA-formatted reference genome and convert it to 2bit format.

$ faToTwoBit myAssembly.fa myAssembly.2bit

Then, you might do something like:

$ blat myAssembly.2bit -oneOff=0 -noHead myQueryContigSeqs.fa myHits.psl

The output file myHits.psl is in PSL format. You can convert to BED with psl2bed and cut -f1-3 to grab the first three columns, or just read the specs for PSL and use cut or awk etc. to grab those columns.

ADD REPLY • link updated 2.6 years ago by Ram 43k • written 9.9 years ago by Alex Reynolds 35k

0

Entering edit mode

Thanks!

I am working on an ANAMMOX project. I don't need to compare the Kuenenia stuttgartiensis genome against human genome sequence.

Have you had experiences in bwa and Samtools? I generated the *.sam file but it has so many gaps. I need to remove the gaps and find the start and end positions of each alignment.

ADD REPLY • link updated 2.6 years ago by Ram 43k • written 9.9 years ago by cdwilliam524 ▴ 30