Looking to count the number of specific repeats
1
0
Entering edit mode
7.4 years ago
aazar955 • 0

So I have RNAseq fastq files and I want to look through them and find the number of reads that map to this TERRA repeat: TTAGGG(repeating)

Is there a way I can use a grep command?

I was trying to create my own reference and do an alignment like this:

bwa mem -t 15 ~/RNAseq2/refs/TERRA.fa ~/RNAseq2/raws/EarlyPassA.R1.fq.gz \ | samblaster \ | sambamba view -t 15 -h -f bam -S /dev/stdin \ | sambamba sort -t 15 -m 10G -o ~/RNAseq2/outT/EarlyPassA.R1.bam /dev/stdin

But I'm still having trouble.

Any suggestions or sample code?

RNA-Seq • 1.3k views
ADD COMMENT
0
Entering edit mode
7.4 years ago

How about removing samblaster and all sorting and just:

bwa mem -t 15 ~/RNAseq2/refs/TERRA.fa ~/RNAseq2/raws/EarlyPassA.R1.fq.gz | samtools view -c -F 260 -

The number printed to the screen will be the number aligning to the repeat. Note that this is somewhat inflated since you're excluding the entirety of the rest of the genome (i.e., it's better to use something like TEtranscript, but if you just need an vague quick number then this might suffice).

ADD COMMENT
0
Entering edit mode

I got these errors:

[bam_header_read] EOF marker is absent. The input is probably truncated. [E::bwa_idx_load] fail to locate the index files [bam_header_read] invalid BAM binary header (this is not a BAM file). [main_samview] fail to read the header from "-".

ADD REPLY
0
Entering edit mode

You either mistyped the command in your original post or my modification of it. If you figure out what's wrong with the bwa part, then the samtools part will work itself out.

ADD REPLY

Login before adding your answer.

Traffic: 2407 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6