# Abstract

** DUDE-Seq** considers the correction of errors from nucleotide sequences produced by next-generation sequencing. Our methodology, named DUDE-Seq, is derived from a general setting of reconstructing finite-valued source data corrupted by a discrete memoryless channel and provides an effective means for correcting substitution and homopolymer indel errors.
Our experimental studies with real and simulated data sets suggest that the proposed DUDE-Seq outperforms existing alternatives in terms of error-correction capabilities, time efficiency, as boosting the reliability of downstream analyses. Further, DUDE-Seq is universally applicable across different sequencing platforms and analysis pipelines by a simple update of the noise model.

# Algorithm

** DUDE-Seq** adopts an

*universal*algorithm called Discrete Universal DEnoiser (DUDE) to the DNA sequence error correction problem. The semi-stochastic modeling approach from the DUDE framework naturally fits the setting of DNA sequence denoising problems.

$\small\hat{X}_{i}(z^{n})=\underset{\hat{x}\in\mathcal{X}}{\operatorname{arg\,min}}\,\mathbf{m}^T(z^n,z_{i-k}^{i-1},z_{i+1}^{i+k})\mathbf{\Pi}^{-1}[\lambda_{\hat{x}}\odot\pi_{z_i}]$

# Paper

Byunghan Lee, Taesup Moon^{*}, Sungroh Yoon^{*}, and Tsachy Weissman,
"DUDE-Seq: Fast, Flexible, and Robust Denoising for Targeted Amplicon Sequencing," *PLOS ONE*, in press.

# Code

DUDE-Seq uses the following dependencies: **libboost-dev, libgsl0-dev, liblapack-dev, zlib1g-dev**.

[code] [web]

# Dataset

[P1-P8]
[A5]
[S5]
[Q19-Q31]
[Π]

SRA: SRP000570 (SRS002051-SRS002053)

ENA: PRJEB6244 (ERS671332-ERS671344)

# Links

# Contact Information

If you have any questions, bug reports, or suggestions, please do not hesitate to contact us.

**Theoretical aspects:**

- Prof. Sungroh Yoon (sryoon [at] snu.ac.kr)
- Prof. Taesup Moon (tsmoon [at] skku.edu)
- Prof. Tsachy Weissman (tsachy [at] stanford.edu)

**For other technical inquiries:**

- Byunghan Lee (styxkr [at] snu.ac.kr)