Requirements

To compile and run CASPER, you will need :

  • Jellyfish - fast, parallel k-mer counter
    CASPER uses a k-mer context frequency for merging.
    Although CASPER has internal k-mer counting method, Jellyfish is recommended for fast and less memory.
    However you can run CASPER without Jellyfish using '-j' option.
  • The Boost Library (www.boost.org)
    CASPER uses boost map to bind the k-mer context and its counting information.
    To compile CASPER, the boost library is definitely needed.

Compilation

  • Get the source files from homepage > PROGRAM
  • Compile
    $ tar -xzvf casper_version.tar.gz
    $ make
    $ ./casper forward.fastq reverse.fastq

Command

To run CASPER, you should follow command instructions.

Usage : casper forward.fastq reverse.fastq [Options]

[Mandatory]
Input forward side FASTQ file first.
Input reverse side FASTQ file following the forward file.

[Options]
-t <int> The number of threads for parallel proessing
(default=64 up to maximum number of system limit)
-k <int> The size of k-mers used to represent contexts around mismatching bases.
(default=17)
-d <int> Threshold for difference of quality-scores Context-based mismatch resolution starts if quality scores differ less than 'd'. Smaller value indicates more trust to quality scores than k-mer context.
(default=19)
-g <float> Threshold for mismatch ratio of best overlap region CASPER gives up merging if the mismatch ratio in the overlap is greater than 'g' and leaves the two reads unmerged. If all the reads have overlap then set 'g' as default or higher. Or if you want sensitive for not merging(TN) then set 'g' as lower than default. (0.27 or lower is recommended)
(default=0.5)
-w <int> The minimum length (in bp) of the overlap between forward and reverse reads.
(default=10bp)
-o <str> Prefix of output(default=casper)
By default, 'casper.fastq' <- merged output is generated.
-j Internal naive k-mer counting method is used instead of Jellyfish. By default (without this option), Jellyfish (for k-mer counting) is used to speed up.
-l CASPER can generate the unmerged output file. prefix_for_left.fastq, prefix_rev_left.fastq for forward, reverse individually.
-h Help for usage information
-v Version information

* CASPER do not need PHRED offset. Either PHRED+64 or PHRED+33 is OK.
Only the difference between two quality scores instead of absolute value is used.


[Examples]
CASE 1 using Jellyfish, output prefix is out, k-mer=19, threads=6
$ ./casper forward.fastq reverse.fastq -o out -k 19 -t 6
CASE 2 without Jellyfish, give up threshold=0.27
$ ./casper forward.fastq reverse.fastq -j -g 0.27

P R O G R A M

D A T A S E T s

Simulated Data
A4 1,000,000 reads
A5 1,000,000 reads
S4 1,000,000 reads
S5 1,000,000 reads
N4 1,000,000 reads
Real Data
C1 716,366 reads
C2 1,350,602 reads
PA 673,845 reads