Circos input format
1
2
Entering edit mode
9.9 years ago
cdwilliam524 ▴ 30

Does anyone know how to convert *.fasta format contig data into a circos input data format? What tools do I need to use?

How could I find the start position and end position of the sequence?

The input karyotype file of circos usually takes inputs as name, label, start and end position and a color in order.

Any help or suggestion would be appreciated!

data-format-conversion alignment • 5.4k views
ADD COMMENT
0
Entering edit mode

Your first link does not work; can you please post a working link to your contig data?

ADD REPLY
0
Entering edit mode

Hey Alex,

The data works on my computer (just tried). Here is the link: http://www.ncbi.nlm.nih.gov/Traces/wgs/?val=AMCG01#contigs

click any FASTA link on the right to get the data.

Thanks a lot! I am a Computer Science person, new to Bioinformatics. Any advice would be appreciated!

ADD REPLY
0
Entering edit mode
9.9 years ago

Perhaps use command-line tools like faToTwoBit to build an indexed reference genome of interest, and then command-line BLAT and your 2bit and FASTA files to query that reference genome. Where it can find matches, BLAT will yield hits that include chromosome name and start/stop positions, which you can parse into input to feed into Circos. These tools are part of the Kent Tools source code package.

ADD COMMENT
0
Entering edit mode

I am currently trying tools like bowtie and bwa to index the reference genome, Candidatus Kuenenia stuttgartiensis; however, the output file .sam is not in its supposed result. I suppose to see actual genome information but I see mass code like NNNNNNNNNNNNCNNNNNNNNNNNNNNNNNN...

I don't think BLAT and BLAST have Candidatus Kuenenia stuttgartiensis (bacteria).

Thanks!

ADD REPLY
0
Entering edit mode

The web version of BLAT does not, very probably, but you can definitely build your own index files and query against them with the command-line tools, if you have the reference genome somewhere.

ADD REPLY
0
Entering edit mode

Yes, I have the reference genome. How could I build my own index and query against it? Could you suggest me some tools and how to do it? What are the command-line tools? It is part of Xcode on Mac?

ADD REPLY
0
Entering edit mode

See: http://genome.ucsc.edu/FAQ/FAQblat.html#blat3

You will need a compiler installed to build these tools. If you are using OS X, you will need to install Xcode and then install the command-line tools via that app. Then you can download the Kent Tools source code and compile it to get blat and faToTwoBit and other tools.

You would take your FASTA-formatted reference genome and convert it to 2bit format.

$ faToTwoBit myAssembly.fa myAssembly.2bit

Then, you might do something like:

$ blat myAssembly.2bit -oneOff=0 -noHead myQueryContigSeqs.fa myHits.psl

The output file myHits.psl is in PSL format. You can convert to BED with psl2bed and cut -f1-3 to grab the first three columns, or just read the specs for PSL and use cut or awk etc. to grab those columns.

ADD REPLY
0
Entering edit mode

Thanks!

I am working on an ANAMMOX project. I don't need to compare the Kuenenia stuttgartiensis genome against human genome sequence.

Have you had experiences in bwa and Samtools? I generated the *.sam file but it has so many gaps. I need to remove the gaps and find the start and end positions of each alignment.

ADD REPLY

Login before adding your answer.

Traffic: 1848 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6