Finding out if a query sequence is present in a read library / bam file (blast against these?)
1
0
Entering edit mode
6 weeks ago
chronotope ▴ 10

Hi everyone! I have a 16S amplicon dataset (a set of R1/R2 reads), and a mapped .bam file that I am guessing is those reads mapped (but what they are mapped against I am a bit confused about). I would like to know if I could somehow "blast" a query sequence against these reads (or the mapped file?) to see if that sequence - or a sequence sufficiently similar to it - is present, in other words, I would like to see if a target organism's 16S is present in the dataset. What are your ways to go about this? Sounded like a simple thing to do, but I am having hard times with google. If you could suggest some ways to go about my question, I would highly appreciate. Ideally bash or R :)

Cheers a

reads amplicon blast mapped sequence • 767 views
1
Entering edit mode
6 weeks ago

I wrote FastqSW http://lindenb.github.io/jvarkit/FastqSW.html to search for sequences in a fastq file,( mostly for fun / not fully tested)

java -jar dist/fastqsw.jar -s 'TCGTGACCTCTCAGGTTAGACCTACAAAATCTTCGATCAAT' \ --min-identity 0.9 --save-align jeter.txt --min-length 40 --pair-logical OR \ src/test/resources/S1.R1.fq.gz src/test/resources/S1.R2.fq.gz (...) @RF11_137_644_0:0:0_2:0:0_23/1 GCTCCCTCGTGACCTCTCAGGTTAGACCTACAAATCTTCGATCAATAGCATTGCGACTTGTTTCATTCTC + 2222222222222222222222222222222222222222222222222222222222222222222222 @RF11_137_644_0:0:0_2:0:0_23/2 GTACATTTCACCAGATGCAGAAGCATTCAGTAAATACATGCTGTCAAAGTCTCCAGAAGATATTGGACCA + 2222222222222222222222222222222222222222222222222222222222222222222222 tail -6 jeter.txt
#RF11_137_644_0:0:0_2:0:0_23/1 distance:0.46 score:189.0 similarity:0.54 length:41 identicals:40 similars:40 identity:0.975609756097561
1                                      41
7 TCGTGACCTCTCAGGTTAGACCTAC-AAATCTTCGATCAAT 46
||||||||||||||||||||||||| |||||||||||||||
1 TCGTGACCTCTCAGGTTAGACCTACAAAATCTTCGATCAAT 41

0
Entering edit mode

thanks! I will try it out today

0
Entering edit mode

Hey! I tried the script, but I am getting this error:

I am looking into the solutions online but so far nothing really works...

Cheers! a

1
Entering edit mode

what was the command line please ?

0
Entering edit mode

actually there was indeed a syntax error in the command, now it looks like this:

java -jar dist/fastqsw.jar -s 'TGAAGGTCTTCGGATTGTAAAGCTCTGTCTTTTGGGACGATAATGACGGTACCAAAGGAGGAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTACTGGGCGTAAAGGGTGCGTAGGCGGATATTTAAGTGGGATGTGAAANNCCCGGGCTTAACTTGGGNGCTGCATTCCAAACTGGATATCTAGAGTGCGGGAGAGGAAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGAGATTAGGAAGAACCCCAGTGGCGAAGGCGACTTTCTGGACCGTAACTGACGCTGAGGCACGAAAGCGTGGGGAGCAAACAGGATTAGAAACCCCGGTAGTC' \ --min-identity 0.95 --save-align test.txt --min-length 40 --pair-logical OR \ /home/irrussional/data/NGS_possibly_locon/000000000-JV5YJ_0_R12348_20211004/demultiplexed/175318_S79_L001_R1_001.fastq.gz \ /home/irrussional/data/NGS_possibly_locon/000000000-JV5YJ_0_R12348_20211004/demultiplexed/175318_S79_L001_R2_001.fastq.gz

I now get the following, new error:

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. [WARN][FastqSW]two files for input. Assuming paired-end input

However, it does continue to run if I understand correct. In the above example, I used identity 0.95 (genus-ish), and it doesn't go on after the [WARN]etc, but at 0.9 it actually prints some data, I guess that means it doesn't really find similar organisms (well, sequences) in my dataset?

0
Entering edit mode

these are just warning/info.

but at 0.9 it actually prints some data, I guess that means it doesn't really find similar organisms (well, sequences) in my dataset?

yeah, you'll have to play with the parameters, with a positive control to find the correct set of arguments.

0
Entering edit mode

Thanks for the headsup about the positive control! Any way I can add another pair of reads to the bam file, containing my positive control (I guess just overlapping flagments of my query sequence that is ~340bp) that you can suggest? BAM to FASTA, add sequences, then FASTA to BAM?.

0
Entering edit mode

Hi, I am having trouble making a loop for this command - when I run a singular instance from the ../jvarkit directory, it runs smooth, but when I try to specify the whole path for the .jar file and run it from elsewhere (so that I can actually loop over multiple directories with cd) it gives me the following error:

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. [SEVERE][FastqSW]There was an error in the command line arguments. Please check your parameters : Expected one or zero argument but got 5 : [test241.txt, test241.txt, /home/irrussional/data/NGS_possibly_locon/000000000-JV5YJ_0_R12348_20211004/demultiplexed/175241/175241_S2_L001_R1_001.fastq.gz, test241.txt, /home/irrussional/data/NGS_possibly_locon/000000000-JV5YJ_0_R12348_20211004/demultiplexed/175241/175241_S2_L001_R2_001.fastq.gz] com.github.lindenb.jvarkit.lang.JvarkitException\$CommandLineError: There was an error in the command line arguments. Please check your parameters : Expected one or zero argument but got 5 : [test241.txt, test241.txt, /home/irrussional/data/NGS_possibly_locon/000000000-JV5YJ_0_R12348_20211004/demultiplexed/175241/175241_S2_L001_R1_001.fastq.gz, test241.txt, /home/irrussional/data/NGS_possibly_locon/000000000-JV5YJ_0_R12348_20211004/demultiplexed/175241/175241_S2_L001_R2_001.fastq.gz] at com.github.lindenb.jvarkit.util.jcommander.Launcher.oneFileOrNull(Launcher.java:654) at com.github.lindenb.jvarkit.jcommander.FastqLauncher.doWork(FastqLauncher.java:98) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:796) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:959) at com.github.lindenb.jvarkit.tools.fastqsw.FastqSW.main(FastqSW.java:307)

Something goes wrong with the arguments, although the only thing I changed versus a working command is the full path to the .jar file. Could you advice me on this maybe?

Cheers! a

0
Entering edit mode

Here is the command I run (still not a looped one, just wanted to test calling the .jar from elsewhere and that already breaks)

java -jar /home/irrussional/install/jvarkit/jvarkit/dist/fastqsw.jar -s 'TGAAGGTCTTCGGATTGTAAAGCTCTGTCTTTTGGGACGATAATGACGGTACCAAAGGAGGAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTACTGGGCGTAAAGGGTGCGTAGGCGGATATTTAAGTGGGATGTGAAANNCCCGGGCTTAACTTGGGNGCTGCATTCCAAACTGGATATCTAGAGTGCGGGAGAGGAAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGAGATTAGGAAGAACCCCAGTGGCGAAGGCGACTTTCTGGACCGTAACTGACGCTGAGGCACGAAAGCGTGGGGAGCAAACAGGATTAGAAACCCCGGTAGTC' \ --min-identity 0.97 --save-align test241.txt --min-length 40 --pair-logical OR \ /home/irrussional/data/NGS_possibly_locon/000000000-JV5YJ_0_R12348_20211004/demultiplexed/175241/175241_S2_L001_R1_001.fastq.gz \ /home/irrussional/data/NGS_possibly_locon/000000000-JV5YJ_0_R12348_20211004/demultiplexed/175241/175241_S2_L001_R2_001.fastq.gz