Question: How to extract the first and last N bases from a read in a fastq file?
0
gravatar for alhamidi.reem
12 weeks ago by
alhamidi.reem10 wrote:

how can I extract the first and last N bases from a read in a fastq file?

I have used the following command to extract the last 1000 bases of a read from a fastq file but I'd also like to incorportate the first 1000 bases to the command as well:

$$  grep -A 4 "read_name_identifier" filename.fq | sed -n '2~4p' | grep -o '.{1000}$'

Also, how can I use the new command for the first and last N bases on a perl script as I have >450 reads in a fastq file?

Many thanks,

Any help will be appreciated.

bioinformatics • 151 views
ADD COMMENTlink modified 12 weeks ago by genomax69k • written 12 weeks ago by alhamidi.reem10

If you want to use perl (or python), I would suggest parsing the file 'properly' with the Bio module. Extracting this information will be fairly trivial, and much more robust.

ADD REPLYlink written 12 weeks ago by jrj.healey13k
0
gravatar for genomax
12 weeks ago by
genomax69k
United States
genomax69k wrote:

Here is one way (using part of your own solution) :

grep -A 4 "read_name_identifier" filename.fq | sed -n '2~4p' | cut -c 1-1000

OR

grep -A 4 "read_name_identifier" filename.fq | sed -n '2~4p' | sed 's/.//1001g'

Not sure why you are referring to perl in your question since there is no perl involved.

Hint: For things like this search stackoverflow for solutions.

ADD COMMENTlink modified 12 weeks ago • written 12 weeks ago by genomax69k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 800 users visited in the last hour