Abstract

DUDE-Seq considers the correction of errors from nucleotide sequences produced by next-generation sequencing. Our methodology, named DUDE-Seq, is derived from a general setting of reconstructing finite-valued source data corrupted by a discrete memoryless channel and provides an effective means for correcting substitution and homopolymer indel errors. Our experimental studies with real and simulated data sets suggest that the proposed DUDE-Seq outperforms existing alternatives in terms of error-correction capabilities, time efficiency, as boosting the reliability of downstream analyses. Further, DUDE-Seq is universally applicable across different sequencing platforms and analysis pipelines by a simple update of the noise model.


Algorithm

DUDE-Seq adopts an universal algorithm called Discrete Universal DEnoiser (DUDE) to the DNA sequence error correction problem. The semi-stochastic modeling approach from the DUDE framework naturally fits the setting of DNA sequence denoising problems.


Paper

Byunghan Lee, Taesup Moon*, Sungroh Yoon*, and Tsachy Weissman, "DUDE-Seq: Fast, Flexible, and Robust Denoising for Targeted Amplicon Sequencing," PLOS ONE, in press.


Code

DUDE-Seq uses the following dependencies: libboost-dev, libgsl0-dev, liblapack-dev, zlib1g-dev.
[code] [web]


Dataset

[P1-P8] [A5] [S5] [Q19-Q31] [Π]
SRA: SRP000570 (SRS002051-SRS002053)
ENA: PRJEB6244 (ERS671332-ERS671344)


Links


Contact Information

If you have any questions, bug reports, or suggestions, please do not hesitate to contact us.

Theoretical aspects:

  • Prof. Sungroh Yoon (sryoon [at] snu.ac.kr)
  • Prof. Taesup Moon (tsmoon [at] skku.edu)
  • Prof. Tsachy Weissman (tsachy [at] stanford.edu)

For other technical inquiries:

  • Byunghan Lee (styxkr [at] snu.ac.kr)