Question: fastq file split in chromosome
1
gravatar for blooming.daisy333
2.2 years ago by
blooming.daisy33390 wrote:

im newbie to NGS and linux. can anyone kindly let me know the easiest way to split the fasta file containing chromosome and scaffold into individual files i.e files containing one chromosome per file... thanks

next-gen • 993 views
ADD COMMENTlink modified 2.2 years ago by Bastien Hervé4.6k • written 2.2 years ago by blooming.daisy33390
1

How do you know which scaffold belongs to which chromosome ? If you don't, you'll have to align all your scaffolds on a reference genome. If you have the information in your scaffold headers, you can split your file with a python script for example.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Bastien Hervé4.6k

sorry it was about Fasta file containing genome sequence and not fastq. i need each chromosome to be in single seperate file rather to be all the genome in one fasta file. any help plz???

ADD REPLYlink written 2.2 years ago by blooming.daisy33390

Aah, I got it, you have a reference genome in fasta and you want to split each sequence to a separated file, isn't it ?

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Bastien Hervé4.6k

Please follow up on your previous threads and mark answers as accepted when appropriate.

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. Upvote|Bookmark|Accept

ADD REPLYlink written 2.2 years ago by WouterDeCoster44k
2
gravatar for Bastien Hervé
2.2 years ago by
Bastien Hervé4.6k
Karolinska Institutet, Sweden
Bastien Hervé4.6k wrote:

A modification of Pierre's answer from here : Splitting A Fasta File

awk '/^>/ {F=substr($0, 2, length($0))".fasta"; print >F;next;} {print >> F;}' < ref_genome.fasta

Just replace ref_genome.fasta with your file

ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by Bastien Hervé4.6k
1
gravatar for Bastien Hervé
2.2 years ago by
Bastien Hervé4.6k
Karolinska Institutet, Sweden
Bastien Hervé4.6k wrote:

I think you could have found this answer somewhere in Biostars, like this one :

How to split fasta into seperate files by chromosome (in the header)

If I understand correctly your goal :

from Bio import SeqIO

for record in SeqIO.parse('ref_genome.fasta', 'fasta') :
    with open( record.id+".fasta", "a") as output_handle :
        SeqIO.write(record, output_handle, 'fasta')
ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by Bastien Hervé4.6k

yes you ve correctly guessed the requirement. thanks for the answer. but is there any other one iiner command like csplit awk ets. I am newbie to linux and could not understand the above command

ADD REPLYlink written 2.2 years ago by blooming.daisy33390
1

It's not a unix command, it's python code.

ADD REPLYlink written 2.2 years ago by WouterDeCoster44k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1131 users visited in the last hour