Finding out if a query sequence is present in a read library / bam file (blast against these?)
1
0
Entering edit mode
20 months ago
chronotope ▴ 10

Hi everyone! I have a 16S amplicon dataset (a set of R1/R2 reads), and a mapped .bam file that I am guessing is those reads mapped (but what they are mapped against I am a bit confused about). I would like to know if I could somehow "blast" a query sequence against these reads (or the mapped file?) to see if that sequence - or a sequence sufficiently similar to it - is present, in other words, I would like to see if a target organism's 16S is present in the dataset. What are your ways to go about this? Sounded like a simple thing to do, but I am having hard times with google. If you could suggest some ways to go about my question, I would highly appreciate. Ideally bash or R :)

Cheers a

reads amplicon blast mapped sequence • 1.5k views
ADD COMMENT
1
Entering edit mode
20 months ago

I wrote FastqSW http://lindenb.github.io/jvarkit/FastqSW.html to search for sequences in a fastq file,( mostly for fun / not fully tested)

$ java -jar dist/fastqsw.jar -s 'TCGTGACCTCTCAGGTTAGACCTACAAAATCTTCGATCAAT' \
    --min-identity 0.9 --save-align jeter.txt --min-length 40 --pair-logical OR \
    src/test/resources/S1.R1.fq.gz src/test/resources/S1.R2.fq.gz


(...)

@RF11_137_644_0:0:0_2:0:0_23/1
GCTCCCTCGTGACCTCTCAGGTTAGACCTACAAATCTTCGATCAATAGCATTGCGACTTGTTTCATTCTC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@RF11_137_644_0:0:0_2:0:0_23/2
GTACATTTCACCAGATGCAGAAGCATTCAGTAAATACATGCTGTCAAAGTCTCCAGAAGATATTGGACCA
+
2222222222222222222222222222222222222222222222222222222222222222222222

$ tail -6 jeter.txt 
#RF11_137_644_0:0:0_2:0:0_23/1 distance:0.46 score:189.0 similarity:0.54 length:41 identicals:40 similars:40 identity:0.975609756097561
    1                                      41
  7 TCGTGACCTCTCAGGTTAGACCTAC-AAATCTTCGATCAAT 46
    ||||||||||||||||||||||||| |||||||||||||||
  1 TCGTGACCTCTCAGGTTAGACCTACAAAATCTTCGATCAAT 41
ADD COMMENT
0
Entering edit mode

thanks! I will try it out today

ADD REPLY
0
Entering edit mode

Hey! I tried the script, but I am getting this error:

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. [WARN][FastqSW]two files for input. Assuming paired-end input [SEVERE][FastqSW]null java.lang.NullPointerException at htsjdk.samtools.util.IOUtil.hasGzipFileExtension(IOUtil.java:778) at htsjdk.samtools.util.IOUtil.openFileForReading(IOUtil.java:694) at htsjdk.samtools.util.IOUtil.openFileForBufferedReading(IOUtil.java:1007) at htsjdk.samtools.util.IOUtil.openFileForBufferedReading(IOUtil.java:1002) at htsjdk.samtools.fastq.FastqReader.<init>(FastqReader.java:77) at htsjdk.samtools.fastq.FastqReader.<init>(FastqReader.java:68) at com.github.lindenb.jvarkit.fastq.FastqPairedReaderFactory.open(FastqPairedReaderFactory.java:75) at com.github.lindenb.jvarkit.fastq.FastqPairedReaderFactory.open(FastqPairedReaderFactory.java:69) at com.github.lindenb.jvarkit.fastq.FastqPairedReaderFactory.open(FastqPairedReaderFactory.java:62) at com.github.lindenb.jvarkit.jcommander.FastqLauncher.doWork(FastqLauncher.java:88) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:796) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:959) at com.github.lindenb.jvarkit.tools.fastqsw.FastqSW.main(FastqSW.java:307) [INFO][Launcher]fastqsw Exited with failure (-1)

I am looking into the solutions online but so far nothing really works...

Cheers! a

ADD REPLY
1
Entering edit mode

what was the command line please ?

ADD REPLY
0
Entering edit mode

actually there was indeed a syntax error in the command, now it looks like this:

java -jar dist/fastqsw.jar -s 'TGAAGGTCTTCGGATTGTAAAGCTCTGTCTTTTGGGACGATAATGACGGTACCAAAGGAGGAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTACTGGGCGTAAAGGGTGCGTAGGCGGATATTTAAGTGGGATGTGAAANNCCCGGGCTTAACTTGGGNGCTGCATTCCAAACTGGATATCTAGAGTGCGGGAGAGGAAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGAGATTAGGAAGAACCCCAGTGGCGAAGGCGACTTTCTGGACCGTAACTGACGCTGAGGCACGAAAGCGTGGGGAGCAAACAGGATTAGAAACCCCGGTAGTC' \ --min-identity 0.95 --save-align test.txt --min-length 40 --pair-logical OR \ /home/irrussional/data/NGS_possibly_locon/000000000-JV5YJ_0_R12348_20211004/demultiplexed/175318_S79_L001_R1_001.fastq.gz \ /home/irrussional/data/NGS_possibly_locon/000000000-JV5YJ_0_R12348_20211004/demultiplexed/175318_S79_L001_R2_001.fastq.gz

I now get the following, new error:

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. [WARN][FastqSW]two files for input. Assuming paired-end input

However, it does continue to run if I understand correct. In the above example, I used identity 0.95 (genus-ish), and it doesn't go on after the [WARN]etc, but at 0.9 it actually prints some data, I guess that means it doesn't really find similar organisms (well, sequences) in my dataset?

ADD REPLY
0
Entering edit mode

these are just warning/info.

but at 0.9 it actually prints some data, I guess that means it doesn't really find similar organisms (well, sequences) in my dataset?

yeah, you'll have to play with the parameters, with a positive control to find the correct set of arguments.

ADD REPLY
0
Entering edit mode

Thanks for the headsup about the positive control! Any way I can add another pair of reads to the bam file, containing my positive control (I guess just overlapping flagments of my query sequence that is ~340bp) that you can suggest? BAM to FASTA, add sequences, then FASTA to BAM?.

Thank you so much already!

ADD REPLY
0
Entering edit mode

Hi, I am having trouble making a loop for this command - when I run a singular instance from the ../jvarkit directory, it runs smooth, but when I try to specify the whole path for the .jar file and run it from elsewhere (so that I can actually loop over multiple directories with cd) it gives me the following error:

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. [SEVERE][FastqSW]There was an error in the command line arguments. Please check your parameters : Expected one or zero argument but got 5 : [test241.txt, test241.txt, /home/irrussional/data/NGS_possibly_locon/000000000-JV5YJ_0_R12348_20211004/demultiplexed/175241/175241_S2_L001_R1_001.fastq.gz, test241.txt, /home/irrussional/data/NGS_possibly_locon/000000000-JV5YJ_0_R12348_20211004/demultiplexed/175241/175241_S2_L001_R2_001.fastq.gz] com.github.lindenb.jvarkit.lang.JvarkitException$CommandLineError: There was an error in the command line arguments. Please check your parameters : Expected one or zero argument but got 5 : [test241.txt, test241.txt, /home/irrussional/data/NGS_possibly_locon/000000000-JV5YJ_0_R12348_20211004/demultiplexed/175241/175241_S2_L001_R1_001.fastq.gz, test241.txt, /home/irrussional/data/NGS_possibly_locon/000000000-JV5YJ_0_R12348_20211004/demultiplexed/175241/175241_S2_L001_R2_001.fastq.gz] at com.github.lindenb.jvarkit.util.jcommander.Launcher.oneFileOrNull(Launcher.java:654) at com.github.lindenb.jvarkit.jcommander.FastqLauncher.doWork(FastqLauncher.java:98) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:796) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:959) at com.github.lindenb.jvarkit.tools.fastqsw.FastqSW.main(FastqSW.java:307)

Something goes wrong with the arguments, although the only thing I changed versus a working command is the full path to the .jar file. Could you advice me on this maybe?

Cheers! a

ADD REPLY
0
Entering edit mode

Here is the command I run (still not a looped one, just wanted to test calling the .jar from elsewhere and that already breaks)

java -jar /home/irrussional/install/jvarkit/jvarkit/dist/fastqsw.jar -s 'TGAAGGTCTTCGGATTGTAAAGCTCTGTCTTTTGGGACGATAATGACGGTACCAAAGGAGGAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTACTGGGCGTAAAGGGTGCGTAGGCGGATATTTAAGTGGGATGTGAAANNCCCGGGCTTAACTTGGGNGCTGCATTCCAAACTGGATATCTAGAGTGCGGGAGAGGAAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGAGATTAGGAAGAACCCCAGTGGCGAAGGCGACTTTCTGGACCGTAACTGACGCTGAGGCACGAAAGCGTGGGGAGCAAACAGGATTAGAAACCCCGGTAGTC' \ --min-identity 0.97 --save-align test241.txt --min-length 40 --pair-logical OR \ /home/irrussional/data/NGS_possibly_locon/000000000-JV5YJ_0_R12348_20211004/demultiplexed/175241/175241_S2_L001_R1_001.fastq.gz \ /home/irrussional/data/NGS_possibly_locon/000000000-JV5YJ_0_R12348_20211004/demultiplexed/175241/175241_S2_L001_R2_001.fastq.gz

ADD REPLY

Login before adding your answer.

Traffic: 1861 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6