Question: fastq file split in chromosome
1
gravatar for blooming.daisy333
8 months ago by
blooming.daisy33320 wrote:

im newbie to NGS and linux. can anyone kindly let me know the easiest way to split the fasta file containing chromosome and scaffold into individual files i.e files containing one chromosome per file... thanks

next-gen • 326 views
ADD COMMENTlink modified 8 months ago by Bastien Hervé2.7k • written 8 months ago by blooming.daisy33320
1

How do you know which scaffold belongs to which chromosome ? If you don't, you'll have to align all your scaffolds on a reference genome. If you have the information in your scaffold headers, you can split your file with a python script for example.

ADD REPLYlink modified 8 months ago • written 8 months ago by Bastien Hervé2.7k

sorry it was about Fasta file containing genome sequence and not fastq. i need each chromosome to be in single seperate file rather to be all the genome in one fasta file. any help plz???

ADD REPLYlink written 8 months ago by blooming.daisy33320

Aah, I got it, you have a reference genome in fasta and you want to split each sequence to a separated file, isn't it ?

ADD REPLYlink modified 8 months ago • written 8 months ago by Bastien Hervé2.7k

Please follow up on your previous threads and mark answers as accepted when appropriate.

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. Upvote|Bookmark|Accept

ADD REPLYlink written 8 months ago by WouterDeCoster35k
2
gravatar for Bastien Hervé
8 months ago by
Bastien Hervé2.7k
Limoges, CBRS, France
Bastien Hervé2.7k wrote:

A modification of Pierre's answer from here : Splitting A Fasta File

awk '/^>/ {F=substr($0, 2, length($0))".fasta"; print >F;next;} {print >> F;}' < ref_genome.fasta

Just replace ref_genome.fasta with your file

ADD COMMENTlink modified 8 months ago • written 8 months ago by Bastien Hervé2.7k
1
gravatar for Bastien Hervé
8 months ago by
Bastien Hervé2.7k
Limoges, CBRS, France
Bastien Hervé2.7k wrote:

I think you could have found this answer somewhere in Biostars, like this one :

How to split fasta into seperate files by chromosome (in the header)

If I understand correctly your goal :

from Bio import SeqIO

for record in SeqIO.parse('ref_genome.fasta', 'fasta') :
    with open( record.id+".fasta", "a") as output_handle :
        SeqIO.write(record, output_handle, 'fasta')
ADD COMMENTlink modified 8 months ago • written 8 months ago by Bastien Hervé2.7k

yes you ve correctly guessed the requirement. thanks for the answer. but is there any other one iiner command like csplit awk ets. I am newbie to linux and could not understand the above command

ADD REPLYlink written 8 months ago by blooming.daisy33320
1

It's not a unix command, it's python code.

ADD REPLYlink written 8 months ago by WouterDeCoster35k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1515 users visited in the last hour