Sequence extract from a big fast file
2
0
Entering edit mode
9.1 years ago
vahapel ▴ 210

Dear All,

After sequencing reaction and extensive filtering, we converted ".fastq" files into ".fasta" files and each fasta file has approximately ~67 million reads. My question is that is there any script for extracting first 30 million reads, then remaining 37 million reads with sequential manner.

Thank you for all your help!

next-gen-sequencing rna-seq • 2.3k views
ADD COMMENT
1
Entering edit mode
9.1 years ago
5heikki 11k

Assuming no linebreaks in sequences, this is as simple as:

head -n x file > output

Where x is number of seqs times 2 (one line for header, one for sequence). Similarly, you can get the last x sequences utilizing tail.

ADD COMMENT
0
Entering edit mode

Thak you so much, 5heikki. It seems very practical way for such a purpose !

ADD REPLY
0
Entering edit mode
9.1 years ago

split fasta file

ADD COMMENT
0
Entering edit mode

Hi, Geek_y, "split fasta file" scripts will be useful for my project, thank you for your help!

ADD REPLY
0
Entering edit mode

Hi,

I am a novice using R to split a FASTA file with 300 000 contigs into 6 file of less than 50 000 contigs. I have seen many options but would anyone advise anything that I could be used in R? Thank you A

ADD REPLY

Login before adding your answer.

Traffic: 2491 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6