RC454 (ReadClean454) is a program that takes a set of 454 read and quality files as well as a consensus assembly for those reads and corrects for known 454 error modes such as homopolymer indels and carry forward/incomplete extension (CAFIE). It will also correct for any indel that breaks the reading frame, unless it occurs in more than 25% of the reads. Since the algorithm is aggressive in correcting for errors, it is important to align the reads to their own assembly rather than to an external reference to prevent misalignments as much as possible. RC454 uses Mosaik to align the corrected reads between each step, and as such it is required to run the script.
Henn MR, Boutwell CL, Charlebois P, Lennon NJ, Power KA, Macalalad AR, Berlin AM, Malboeuf CM, Ryan EM, Gnerre S, Zody MC, Erlich RL, Green LM, Berical A, Wang Y, Casali M, Steeck H, Bloom AK, Dudek T, Tully D, Newman R, Axten KL, Gladden AD, Battis L, Kemper M, Zeng Q, Shea TP, Gujja S, Zedlack C, Gasser O, Brander C, Hess C, Gunthard HF, Brumme ZL, Brumme CJ, Bazner S, Rychert J, Tinsley JP, Mayer KH, Rosenberg E, Pereya F, Levin JZ, Young SK, Jessen H, Altfeld M, Birren BW, Walker BD, Allen TM(2012) Whole Genome Deep Sequencing of HIV-1 Reveals the Impact of Early Minor Variants Upon Immune Recognition During Acute Infection. PLoS Pathogens 8(3): e1002529.
Support for this tool was provided by the National Institute of Allergy and Infectious Diseases through the Microbial Sequencing Center and Genome Sequencing Center for Infectious Diseases, by grants made by the Bill & Melinda Gates Foundation (awarded to the Ragon Institute), the NIAID-DAIDS (awarded to Ragon Institute), and through internal support at the Broad Institute.