Extract some interest sequences with a new name
1
0
Entering edit mode
4.5 years ago
Sillpositive ▴ 20

Hello everyone

I have a fasta file containing all sequences of my genome.

I only want to extract the sequences of chromosomes 1 to 33 but also MT W and Z chromosomes. However, I would like to change the names when saving them in another file such as (I just put the header but of course I want the sequence after header):

INPUT:

>MT dna:chromosome chromosome:GRCg6a:MT:1:16775:1 REF
 ATCGTTTTTTT...
>W dna:chromosome chromosome:GRCg6a:W:1:6813114:1 REF
>Z dna:chromosome chromosome:GRCg6a:Z:1:82529921:1 REF
>1 dna:chromosome chromosome:GRCg6a:1:1:197608386:1 REF
>2 dna:chromosome chromosome:GRCg6a:2:1:149682049:1 REF
>3 dna:chromosome chromosome:GRCg6a:3:1:110838418:1 REF
>4 dna:chromosome chromosome:GRCg6a:4:1:91315245:1 REF
....

And OUTPUT that I WANT :

>chr MT
>chr W 
>chr Z
>chr 1 
>chr 2 
>chr 3 
>chr 4 

Thank you for your answer

awk • 806 views
ADD COMMENT
1
Entering edit mode

This might help

ADD REPLY
1
Entering edit mode

Thank you for your help man ! Thus I wrote a script in Biopython and solve the problem !

ADD REPLY
0
Entering edit mode

Sorry for the generic answer, but the question sound like a "write-a-script-for-me" request XD of course it is like 5-lines script

ADD REPLY
0
Entering edit mode
4.5 years ago
JC 13k

You will need to a) keep a list of chromosomes names you want, b) edit their label,

#!/usr/bin/perl

use strict;
use warnings;

# Create a list of wanted sequences
my %want = ("W" => 1, "Z" => 1, "MT" => 1);
for (my $i = 1; $i <= 33; $i++) { $want{$i}++; }

$/ = "\n>"; # read sequences in blocks
while (<>) {
    s/>//g;
    my ($id) = split (/\n/, $_); # get sequence id
    $id =~ s/\s+.+//; # remove rest of the sequence id after the first space
    print ">chr $_" if (defined $want{$id}); # print selected sequences
}

To use this: perl selectChr.pl < FASTA_IN > FASTA_OUT

ADD COMMENT

Login before adding your answer.

Traffic: 2623 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6