Hello! I've recently started working with NGS data and I got stuck and was wondering if anyone could help me. Before I started I used some sequencing data from a colleague of mine to get my pipeline up and running and so far that works.
What I am doing is the following:
Create index files of hg19 using BWA index
Use BWA mem to align my fastq.gz files
Use Samtools to create BAM files
Use lofreq to call the variants
This all works completely fine using the genomic data from my colleague but my approach was a little different. I sequenced cDNA and I was wondering if I could use a fasta file created by myself of the gene of interest which I downloaded from UCSC. When I try to create index files from that fasta using BWA index, it does so very rapidly (the cDNA is only 700bp), but afterwards it gives me Segmentation fault (core dumped). I think this has to do with my custom fasta and accompanying index files. Any thoughts?
Thanks in advance!
The custom fasta I want to use:
> [HPRT cDNA NGS full sequence.xdna] - 1289 bp
TTACCTCACTGCTTTCCGGAGCGGTAGCACCTCCTCCGCCGGCTTCCTCCTCAGACCGCTTTTTGCCGCGAGCCGACCGG
TCCCGTCATGCCGACCCGCAGTCCCAGCGTCGTGATTAGCGATGATGAACCAGGTTATGACCTAGATTTGTTTTGTATAC
CTAATCATTATGCCGAGGATTTGGAAAAAGTGTTTATTCCTCATGGACTGATTATGGACAGGACTGAAAGACTTGCTCGA
GATGTCATGAAGGAGATGGGAGGCCATCACATTGTGGCCCTCTGTGTGCTCAAGGGGGGCTATAAGTTCTTTGCTGACCT
GCTGGATTACATTAAAGCACTGAATAGAAATAGTGATAGATCCATTCCTATGACTGTAGATTTTATCAGACTGAAGAGCT
ACTGTAATGATCAGTCAACGGGGGACATAAAAGTTATTGGTGGAGATGATCTCTCAACTTTAACTGGAAAGAATGTCTTG
ATTGTTGAAGATATAATTGACACTGGTAAAACAATGCAAACTTTGCTTTCCCTGGTTAAGCAGTACAGCCCCAAAATGGT
TAAGGTTGCAAGCTTGCTGGTGAAAAGGACCTCTCGAAGTGTTGGATACAGGCCAGACTTTGTTGGATTTGAAATTCCAG
ACAAGTTTGTTGTTGGATATGCCCTTGACTATAATGAGTACTTCAGGGATTTGAATCACGTTTGTGTCATTAGTGAAACT
GGAAAAGCCAAATACAAAGCCTAAGATGAGCGCAAGTTGAATCTGCAAATACGAGGAGTCCTGTTGATGTTGCCAGTAAA
ATTAGCAGGTGTTCTAGTCCTGTGGCCATCTGCCTAGTAAAGCTTTTTGCATGAACCTTCTATGAATGTTACTGTTTTAT
TTTTAGAAATGTCAGTTGCTGCGTCCCCAGACTTTTGATTTGCACTATGAGCCTATAGGCCAGCCTACCCTCTGGTAGAT
TGTCGCTTATCTTGTAAGAAAAACAAATCTCTTAAATTACCACTTTTAAATAATAATACTGAGATTGTATCTGTAAGAAG
GATTTAAAGAGAAGCTATATTAGTTTTTTAATTGGTATTTTAATTTTTATATATTCAGGAGAGAAAGATGTGATTGATAT
TGTTAATTTAGACGAGTCTGAAGCTCTCGATTTCCTATCAGTAACAGCATCTAAGAGGTTTTGCTCAGTGGAATAAACAT
GTTTCAGCAGTGTTGGCTGTATTTTCCCACTTTCAGTAAATCGTTGTCAACAGTTCCTTTTAAATGCAAATAAATAAATT
CTAAAAATT
The code:
bwa index fastafile.fa
bwa mem fastafile.fa sample1Fw.fastq.gz sample1Rv.fastq.gz > sample1.sam
The error:
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 66226 sequences (10000126 bp)...
./bwascript-test.sh: line 2: 26915 Segmentation fault (core dumped)
line 2 is the bwa mem I typed above.
I don't think in principle there should be any problem doing this. I've mapped reads to short bespoke references in the past. As ATpoint says, without a bit more direct information we can't be of much help.
For some initial troubleshooting steps, perhaps try decimating your data. Seg faults typically imply not enough memory, so perhaps reducing the number of reads so you have less mapping and less to discard might help pinpoint the issue. It seems unlikely that its memory based though (but if you could share some hardware specs that would also help).
Perhaps consider trying bowtie or minimap and see if they also choke on your data.
Yeah I thought of memory as well, but it also gave me the error when I ran with 20 reads :0
Does the command run when not inside your script?
Nope same problem sadly! Thanks for the help so far.
Can you post the full output of
This should work without any issues unless your index is not being made properly.
Normally you should see something like this being printed to screen:
Seems to work so far. Its the next step that gives the problem. If I use hg19 reference that works but aligning to genomic sequence for cDNA is unlogical. So I assume there's something wrong with my input for reference?
What does that mean?
As long as your cDNA fasta sequence is ok, it should work. Example I posted above had about a kb of sequence in the reference. I can align a set of fastq reads to this index without any issues using
bwa mem
.