Question: How can I map reads only of a certain length?
0
gravatar for a.rex
2.3 years ago by
a.rex240
a.rex240 wrote:

I have recently done an ATAC-seq experiment -

I wish to selectively map reads of <100bp from a fragment distribution of fragments from 20bp to 450bp following illumina next-seq PE sequencing. This way I can look at reads that most likely originate from nucleosome-free regions...

How can I achieve this using an aligner such as BWA or bowtie? Can I use a command prior or even following mapping that looks only fragments of this length?

Thank you

sequencing bwa atac-seq • 654 views
ADD COMMENTlink modified 2.3 years ago by Pierre Lindenbaum131k • written 2.3 years ago by a.rex240

are you talking about the read length or about the fragment length (distance between a read and its mate ) ??

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Pierre Lindenbaum131k

Apologies, I mean fragment length.

ADD REPLYlink written 2.3 years ago by a.rex240
1

You can filter the alignments after wards to keep required fragment length (Bam : Extract Only Alignment With A Specific Length ).

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by genomax91k
1
gravatar for ATpoint
2.3 years ago by
ATpoint40k
Germany
ATpoint40k wrote:

You first align you data with default settings using BWA against your reference genome. Be sure to have trimmed the Nextera adapter from your reads. If you have not done that, you might use skewer to do so:

skewer -x CTGTCTCTTATACACATCTCCGAGCCCACGAGAC -y CTGTCTCTTATACACATCTCCGAGCCCACGAGAC -m pe -n -q 30 -Q 25 -o ${BASENAME} ${BASENAME}_1.fastq.gz ${BASENAME}_2.fastq.gz

Once you have your BAM file, use this code snipped to get the reads you want:

samtools view -h $1 | awk -v LEN=$2 '{if ($9 <= LEN && $9 >= -(LEN) && $9 != 0 || $1 ~ /^@/) print $0}' | samtools view -bh -o $3 -

Save it as a bash script, then use:

./script.sh in.bam 100 out.bam

This will give you reads (fragments) with an insert size below 100bp. If you want further size selection, just modify the if-condition according to your needs. You can speed things up by adding the multithreading option -@ to samtools and using mawk instead of awk.

EDIT: There is also a python package called NucleoATAC to call NFRs and histone positions directly from ATAC-seq data. If always found the documentation quiet difficult to digest, but maybe you may find it helpful. That having said, I gave up using it because I simply did not really understand what exactly it does and what all these output files mean, so I ended up defining NFRs as peaks called from a BAM file with fragments < 100bp. These peaks were required to overlap with peaks called from the full BAM file. This then (what a surprise) typically gives you peaks in the center of the full-BAM peaks, just the width is smaller.

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by ATpoint40k

Thank you ATpoint - I agree, I have also found NucleoATAC to be a bit difficult to comprehend. I have used your code, and after having plotted the NFRs, it does seem very similar but narrower than the full bam file. I suppose you could compare reads <100bp with that of mononucleosome reads (147-200bp)? This would then give you where 'open chromatin' and where the nucleosomes are?

ADD REPLYlink written 2.3 years ago by a.rex240
1
gravatar for Pierre Lindenbaum
2.3 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum131k wrote:

using samjdk: http://lindenb.github.io/jvarkit/SamJdk.html

java -jar dist/samjdk.jar -e 'return record.getReadPairedFlag() && !record.getReadUnmappedFlag() && !record.getMateUnmappedFlag() && record.getReferenceName().equals(record.getMateReferenceName()) && Math.abs(record. getInferredInsertSize())< 100;' input.bam
ADD COMMENTlink written 2.3 years ago by Pierre Lindenbaum131k

I presume if I change to '....> 140;' this will give me reads greater than 140?

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by a.rex240
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2038 users visited in the last hour