Question: Split Fastq File Into Different Files Only Comprising One Chromosome Each
4
gravatar for Smitz
8.2 years ago by
Smitz50
Smitz50 wrote:

Hi All, I need to split my fastq file that is composed of 30 chromosomes into 30 different files containing each the information from one chromosome. Technically, I need to split this kind of file into seperated one:

@chr1
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnatgctgggtgatctttagtcnnnnnnnnnn
nnnnnnnnnnnnnnnnatggggtcatgtacacacacacattggatannnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnatgctgggtgatctttagtcnnnnnnnnnn
...
@chr2
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnatgctgggtgatctttagtcnnnnnnnnnn
nnnnnnnnnnnnnnnnatggggtcatgtacacacacacattggatannnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnatgctgggtgatctttagtcnnnnnnnnnn
...

I tried to use :

awk '/^@chr1$/,/^+$/' consensus.fastq | perl -pe "s/@/>/ ; s/\+//" > chr1.fasta

but that gives me this :

@chr1
nnnnnagtnnnnnnnnnnnnnnnnnnnnnttgcnnnnnnnnnnnnnnnnnnnnnngcnnn
nnnntgaaannnnnnnnnnnntcnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn

Did somebody had already this kind of problems? Could somebody gives me some advices? Thanks a lot!

Nath

fastq sequence split file parsing • 4.9k views
ADD COMMENTlink modified 4.6 years ago by Biostar ♦♦ 20 • written 8.2 years ago by Smitz50
6
gravatar for iw9oel_ad
8.2 years ago by
iw9oel_ad6.0k
iw9oel_ad6.0k wrote:

If you have EMBOSS installed

seqretsplit -sformat fastq-sanger -osformat fastq file.fastq

will give you one Fastq record per file. The files will be named after the sequence ID, so chr1.fastq, chr2.fastq etc. Note that you should use -sformat fastq-sanger, fastq-illumina or fastq-solexa, depending on which encoding your file uses. See this question on Fastq format.

To get Fasta output (you don't explicitly ask for it, but your code implies it), simply change the -osformat argument.

seqretsplit -sformat fastq-sanger -osformat fasta file.fastq
ADD COMMENTlink modified 8.2 years ago • written 8.2 years ago by iw9oel_ad6.0k
0
gravatar for Smitz
8.2 years ago by
Smitz50
Smitz50 wrote:

Thanks a lot Keith!

I downloaded the software and it worked perfectly!

Cheers, Nath

ADD COMMENTlink written 8.2 years ago by Smitz50
3

Nath, you should have appended this as a comment to the previous question rather than as a separate answer. If Keith's answer is spot on for you, consider checking the tick box on his answer, so that this becomes the 'accepted answer'.

ADD REPLYlink written 8.2 years ago by Daniel Swan13k
1

Isn't it a little harsh to downvote a new user with a rep of 19 for an honest mistake?

ADD REPLYlink written 8.2 years ago by D W150

Ok, thanks Daniel! Sorry, it is the first time I used this kind of forum. Thank a lot for your advice! Nath

ADD REPLYlink written 8.2 years ago by Smitz50
0
gravatar for Apexy
7.8 years ago by
Apexy0
Apexy0 wrote:

Hi ALL, I tried using 'seqretsplit' to split a fastq file because I do not have enough memory to load the entire file into fastq. I used:seqretsplit -sformat fastq-illumina -osformat fastq ../r17_s6_sequence.fq However, the number of output files were as many as the number sequences. Is there any flag which can be used to specify how many files and therefore how many sequences would each file contain? Many thanks

ADD COMMENTlink written 7.8 years ago by Apexy0

Please create a new question for this since it is not an answer but another question.

ADD REPLYlink written 7.8 years ago by Istvan Albert ♦♦ 79k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 849 users visited in the last hour