#### What is NASCUP?

** NASCUP** (Necleic Acid Sequence Classification by Universal Probability) is a new classification method that captures the probabilistic structure of a sequence family as a compact context-tree model and uses it efficiently to test prosimity and membership of a query sequence.
Nascup crucially utilizes the notion of universal probability from information theory in model-building and classification processes, delivering BLAST-like accuracy in orders-of-magnitude reduced runtime for large-scale databases.

##### Main Algorithm

NASCUP chooses the context tree (among all possible context trees that may arise from the context graph) that has the maximum universal probability. This model-building method closely resembles the context-tree maximizing data compression algorithm, which is known to achieve the optimal compression performance, as well as tree-based decision algorithms. NASCUP, however does not account for the description complexity of the tree model itsel, which is crucial in compression or tree-based decision making, but is irrelevant in classification. Given a new sequence whose family membership is unknown, NASCUP compares the (conditional) probabilities of the sequence given context trees for the sequence families. These probabilities are once again computed according to universal probability assignments, and the family with highest probability is selected.#### Release History

##### version 0.8.1 (October, 2015)

- First release of NASCUP
- Download : NASCUP

#### P A P E R

#### P R O G R A M

NASCUP (for linux) | v0.8.1 |

#### D A T A S E T s

Functional non-coding RNA | |

RF | 1,320 families, 170,881 seqs |

Microbial Taxonomy | |

- rRNA database | |

RD | 134 families, 3,838 seqs |

GG | 464 families, 23,142 seqs |

SS | 313 families, 17,625 seqs |

SL | 107 families, 4,593 seqs |

- pyrosequencing data | |

AR | 60 families, 44,407 seqs |

DV | 23 families, 55,466 seqs |

Full Greengenes | |

BGG | 60,717 - 560,969 seqs |