Question: How To Look For Known Fusion In Fastq File
gravatar for Angel
7.4 years ago by
United States
Angel210 wrote:


I have an internal data for NCI-h660 file with 8m mapped pairs (HiSeq, 50bp paired end data) and I have an external dataset (4m mapped pairs, 50 bp paired end generated on GAII).

Questions: 1. I observe TMPRSS2-ERG fusion with external dataset, not with internal data from HiSeq. What could be the reasons? I use tophat2 fusion with same parameters for both the datasets.

  1. How can I investigate the FASTQ file to see if this fusion is present. The sequence of ERG-TMPRSS2 fusion is as mentioned here:

  2. Does this mean we need more data generated internally to find the same fusion? I use the following possible thresholds that are the minimum possible:

tophat-fusion-post -p $np --skip-read-dist --num-fusion-reads 1 --num-fusion-pairs 1 --num-fusion-both 2 $index

Any help will be greatly appreciated!! Thanks.

fusion fastq • 2.5k views
ADD COMMENTlink modified 7.4 years ago by swbarnes29.4k • written 7.4 years ago by Angel210
gravatar for swbarnes2
7.4 years ago by
United States
swbarnes29.4k wrote:

Use grep to search your fastq for a specific sequence.

Something like

grep -A 2 -B 1 GGAATAACCTGCCGCG myfastq.fastq > junctions.fastq

The -A means "Get 2 lines after the line that matches that sequence". -B means "get the one line before the line that matches the sequence". This will give you the full 4 lines of the fastq entry. If you don't need that, you can omit those two options. Check the rev-comp of that sequence too.

If your fastq is gzipped, use zgrep instead of grep. If you have a .bam file, do this to search the .bam

samtools view mybam.bam | grep GGAATAACCTGCCGCG - > junctions.sam

samtools view is reading the .bam, and converting it to a plain text .sam, and feeding that one line at a time to grep, which is only going to output the lines that contain your sequence to junctions.sam.

ADD COMMENTlink modified 7.4 years ago • written 7.4 years ago by swbarnes29.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1100 users visited in the last hour