Question

How could I get one chromosome per sequence file since I already have the whole genome file

1

Entering edit mode

8.7 years ago

cloudliusihui ▴ 10

Thank you guys, lately I found myself lost in how to use the Mapsplice tools.

Here is the need of MapSplice:

The directory containing the sequence files of reference genome. All sequence files are required to:

In "FASTA" format, with '.fa' extension.
One chromosome per sequence file.
Chromosome name in the header line ('>' not included) is the same as the sequence file base name, and does not contain any blank space.
E.g. If the header line is '>chr1', then the sequence file name should be 'chr1.fa'.

I only have the whole Genome file which is human.fa, how can I get the seperately files? Thank you!

genome sequence • 5.1k views

ADD COMMENT • link updated 18 months ago by Ram 43k • written 8.7 years ago by cloudliusihui ▴ 10

1

Entering edit mode

see How To Split One Big Sequence File Into Multiple Files With Less Than 1000 Sequences In A Single File

ADD REPLY • link 8.7 years ago by Pierre Lindenbaum 161k

Ram · Answer 1 · 2015-08-14

You can do this with pyfaidx:

faidx --split-files multifasta.fa

The defaults will create one file per sequence, with the file names derived from the sequence names. If there are no spaces in the sequence names there will be none in the file names. Special characters are replaced with ".", but you could modify the sequence IDs/file names with the --regex / --delimiter flags.

Ram · Answer 2 · 2015-08-14

2

Entering edit mode

8.7 years ago

Nicola Casiraghi ▴ 500

You can get the fasta sequence for each chromosome (GRCh37) here.

ADD COMMENT • link updated 18 months ago by Ram 43k • written 8.7 years ago by Nicola Casiraghi ▴ 500

0

Entering edit mode

I did it using perl, but still thank you! Here are the details

#!/usr/bin/perl

    $f = $ARGV[0]; #get the file name

    open (INFILE, "<$f")
    or die "Can't open: $f $!";

    while (<INFILE>) {
    $line = $_;
    chomp $line;
    if ($line =~ /\>/) { #if has fasta >
        close OUTFILE;
        $new_file = substr($line,1);
        $new_file .= ".fa";
        open (OUTFILE, ">$new_file")
        or die "Can't open: $new_file $!";
    }
    print OUTFILE "$line\n";
    }
    close OUTFILE;

ADD REPLY • link updated 18 months ago by Ram 43k • written 8.7 years ago by cloudliusihui ▴ 10