align interleaved fastq file
1
0
Entering edit mode
6 months ago
amitpande74 ▴ 20

Hi,

I have interleaved fastq files from RNAseq paired end reads, after extracting a transposable element from reads. It looks like this:

@V300012057L3C001R0011341817/1
CTTTACACTCTTTCCCTACACGACGCTCTTCCACGCTCTTCCGATCTTCGGGGGGTGTTCTTCTCGGGCATGCTAGTTGTGGTTTGTCCAAACTCATCGA
+
FFCD>FFE?FFEF15FFE@FEF3FEF?FFFFDFFEFFFFF=C2CBFFF7FF3<FCAFE=>F>FF>FEDFCBFFFDFFFF9FFCD:F<>FFCFFFE<FECD
@V300012057L3C001R0011341817/2
GTTCGCCTTCCGCCGCGTGGAGGAGCTGCACAGCAACACCGAGCTGGGCATCGTGGAGTACCAGCACGCCTTCAAGGCCCCCATCGCCTTCGCCAGATCT
+
D&;=FDDEC8,9A/EBC5?B:E@?>EEF/<BE@A<BAD=@E7BC@E@2C<CDE?E:6E;F8D2FD@0?BECE;EB9'6F<@CA)B=CDA@C@CE9FBA<B
@V300012057L3C001R037112077/1
CGATCACGCTCTTCCGATCTGTGCATTTGTGTGCCGGTTACCATGCTAGTTGTGGTTTGTCCAAACTCATCGAGCTCGAGATCTGGCGAAGGCGATGGGG
+
@A:FFFDCEFEFEBEF5F1FF:BFF>E@@BFDE>E=B@@GCF@CEDE:EEAFDFE?AFFA?FFEECEEDAC-CFEFFB?@A;DDF=F:??:?E8F>EBC?
@V300012057L3C001R037112077/2
CCATGTTCGCCTTCCGCCGCGTGGAGGAGCTGCACAGCAACACCGAGCTGGGCATCGTGGAGTACCAGCACGCCTTCAAGACCCCCATCGCCTTCGCCAG
+
DFFFFDEFFFFEFFFFFFFE@@FFCFF?FFF;FFFFFFFFFCFFFCFFEFF:FFFFDEFF@FCFFFFEFFFFFF>F@FFFDCFFFFFFEFFFFFFDFFEF
@V300012057L3C001R037167604/1
TAAATGTCAGGAATTGTGAAAAAGTGAGTTTAAATGTATTTGGCTAAGGTGTATGTAAACTTCCGACTTCAACTGTATAGGGATCAGATCGGAAGAGCGT
+
AGFFFGFFFFFFFFFFFFFFFGFFFFFFFFGGFGGFFGGFFFDFFFFFFFFFFEFGFGFFFGFGFAFGFFFFFFFFFGFFFFFGGFF>FFFFFFFFFFFF
@V300012057L3C001R037167604/2
GTCGGAAGTTTACATACACCTTAGTATTTGGTAGCATTGCCTTTACACTCTTTCCCTACACGACGCTCTTCCGATCTGATCCCTATACAGTTGAAGTCGG

Is their any nice way to align this to the human genome ? Thanks in advance.

interleaved fastq align • 606 views
ADD COMMENT
2
Entering edit mode
6 months ago
GenoMax 108k

bbmap.sh the aligner that is part of BBMap suite can use interleaved data files for direct alignment. Take a look at in-line help for options. A guide is also available. Have samtools available in $PATH to directly produce a BAM file.

bbmap.sh in=your.fastq ref=human.fa out=aligned.bam ambig=random

Otherwise you can easily split the file into two reads using reformat.sh from BBMap suite as follows and use with any other aligner of your choice.

reformat.sh in=your.fastq out1=R1.fastq out2=R2.fastq
ADD COMMENT
0
Entering edit mode

Did it but they dont align. Tried HISAT2, STAR and even bowtie2. The task was to extract ACGCTCTTCCGATCTNNNNNNNNNNNNNNNNNNNNNCAT[G]CTAGTTGTGGTTTGTC from the reads. I tried this :

paste <(cat RNA3_L3_519_1.fastq | paste - - - - )  <(cat RNA3_L3_519_2.fastq | paste - - - - )  |  grep CAT[G]CTAGTTGTGGTTTGTC | tr "\t" "\n" > interleaved.fastq

Looks like it didnt work.

Is there a way to extract only the reads which have the entire motif ? i.e

ACGCTCTTCCGATCTNNNNNNNNNNNNNNNNNNNNNCAT[G]CTAGTTGTGGTTTGTC

where N happens to be the barcode.

Thanks.

ADD REPLY
0
Entering edit mode

tried doing with seqkit

seqkit locate  --degenerate --ignore-case --pattern-file sb.fa RNA3_L3_519_1.fastq 

but now the fastq is beyond retireval.

ADD REPLY
0
Entering edit mode

Is there a way to extract only the reads which have the entire motif

This is a completely different problem than what you had originally asked.

An aligner is not the correct tool for this purpose. Take a look at seqkit grep. Something like the following may also work (untested you will need to try it out):

bbduk.sh -Xmx4g in1=RNA3_L3_519_1.fastq  in2=RNA3_L3_519_2.fastq  out=stdout.fq literal=ACGCTCTTCCGATCT restrictleft=15 | bbduk.sh -Xmx4g in=stdin.fq out1=filtred_R1.fq out2=filtered_R2.fq literal=CTAGTTGTGGTTTGTC restrictright=15
ADD REPLY
0
Entering edit mode

I tried pulling out the heads from the fastq files:

    seqkit locate  --degenerate --ignore-case --pattern-file sb.fa RNA3_L3_519_1.fastq 

Then, tried filtering and extracting :

filterbyname.sh in=RNA3_L3_519_1.fastq out=RNA3_1.fq names=IDs_1.txt include=t fixjunk

and then finally,

  repair.sh in1=RNA3_1.fq in2=RNA3_2.fq out1=fixed1.fq out2=fixed2.fq outsingle=single.fq

but still the reads didnt align. Let me try what you have sent and shall keep you updated. Thanks for your input.

regards.

ADD REPLY

Login before adding your answer.

Traffic: 2057 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6