Question

How to recompute coding sequences from a psl format ?

0

Entering edit mode

6.5 years ago

Thomas B. ▴ 30

Hi everyone,

I mapped coding sequences (CDSs) onto a genome using GMAP and got a report in psl format.

Is there a readily available tool / way to extract the genome sequences that matched - I mean those that correspond to blocks - then recompute coding sequences ?

genome alignment sequence GMAP CDS • 2.1k views

ADD COMMENT • link updated 5.9 years ago by Biostar 20 • written 6.5 years ago by Thomas B. ▴ 30

1

Entering edit mode

You could convert your PSL file to BED using BEDOPS: http://bedops.readthedocs.io/en/latest/content/reference/file-management/conversion/psl2bed.html

Then, convert the BED to GTF or GFF and extract FASTA sequence over these regions using gffread: http://ccb.jhu.edu/software/stringtie/gff.shtml#gffread

ADD REPLY • link 6.5 years ago by Kevin Blighe 87k

0

Entering edit mode

Hi Kevin,

Thanks for your reply!

I converted my PSL file to BED using psl2bed as you suggested.

psl2bed < my_report.psl > my_report.bed

I then used the online bed_to_gff_converter tool from Galaxy to convert the BED file to GFF.

I finally used gffread to extract FASTA sequences as you also suggested.

./gffread my_report.gff -g my_ref_genome.fa -w my_transcripts.fa

But there is no sequence in the final output file.

Do you know what may be wrong? I would also like to ask you how you convert BED to GTF or GFF?

ADD REPLY • link 6.5 years ago by Thomas B. ▴ 30

0

Entering edit mode

Hey Thomas,

Have you looked at the contents of your output files at each step in order to see if the format is okay? The BED-to-GTF format conversion is difficult, but you may want to take a look here: How To Convert Bed Format To Gtf?

Here too: http://onetipperday.sterding.com/2012/08/convert-bed-to-gtf.html

Kevin

ADD REPLY • link 6.5 years ago by Kevin Blighe 87k

1

Entering edit mode

Oh wait, there is a nice --exons=genomic option in GMAP that does what I want. However the output should be processed (something like cat my_report.txt | grep -v '<' > my_report.fasta) in order to obtain a true FASTA file. It seems also that if there are multiple matches for a same query, the default behaviour is to keep only the best one.

Now back to what you suggested. It is also possible to convert a BED file directly into FASTA using bedtools: http://bedtools.readthedocs.io/en/latest/

My command was bedtools getfasta -fi reference_genome.fasta -bed my_report.bed -split -fo my_report.fasta. The -split option aims to extract and concatenate the sequences from the BED blocks. Here each match is retained, even multiple matches for a same query.

Thanks for your help!

ADD REPLY • link 6.5 years ago by Thomas B. ▴ 30

0

Entering edit mode

Hi Thomas, great that you got it all sorted out.

ADD REPLY • link 6.5 years ago by Kevin Blighe 87k