Question: How could I get one chromosome per sequence file since I already have the whole genome file
1
gravatar for cloudliusihui
4.0 years ago by
United States
cloudliusihui10 wrote:

Thank U guys , lately I found myself lost in how to use the Mapsplice tools.

Here is the need of MapSplice:

The directory containing the sequence files of reference genome. All sequence files are required to:

  • In "FASTA" format, with  '.fa' extension.
  • One chromosome per sequence file.
  • Chromosome name in the header line ('>' not included) is the same as the sequence file base name, and does not contain any blank space.
  • E.g. If the header line is '>chr1', then the sequence file name should be 'chr1.fa'.

I only have the whole Genome file which is human.fa ,how can I get the seperately files? Thank U!

sequence genome • 1.9k views
ADD COMMENTlink modified 6 months ago by Biostar ♦♦ 20 • written 4.0 years ago by cloudliusihui10
5
gravatar for Matt Shirley
4.0 years ago by
Matt Shirley9.1k
Cambridge, MA
Matt Shirley9.1k wrote:

You can do this with pyfaidx:

faidx --split-files multifasta.fa

The defaults will create one file per sequence, with the file names derived from the sequence names. If there are no spaces in the sequence names there will be none in the file names. Special characters are replaced with ".", but you could modify the sequence IDs/file names with the --regex / --delimiter flags.

ADD COMMENTlink written 4.0 years ago by Matt Shirley9.1k
2
gravatar for Nicola Casiraghi
4.0 years ago by
Germany, Heidelberg, DKFZ EMBL
Nicola Casiraghi450 wrote:

You can get the fasta sequence for each chromosome (GRCh37) here:

http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/

ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by Nicola Casiraghi450

I did it using perl,but still thank you!

 

here is the details

#!/usr/bin/perl

    $f = $ARGV[0]; #get the file name

    open (INFILE, "<$f")
    or die "Can't open: $f $!";

    while (<INFILE>) {
    $line = $_;
    chomp $line;
    if ($line =~ /\>/) { #if has fasta >
        close OUTFILE;
        $new_file = substr($line,1);
        $new_file .= ".fa";
        open (OUTFILE, ">$new_file")
        or die "Can't open: $new_file $!";
    }
    print OUTFILE "$line\n";
    }
    close OUTFILE;

ADD REPLYlink written 4.0 years ago by cloudliusihui10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 542 users visited in the last hour