index

Cleaver

Software for finding differences among restriction digests of orthologous DNA sequences.

Introduction

Restriction endonuclease digestion is one of the oldest methods for manipulating DNA in vitro and it is still useful in many situations. Restriction endonucleases (specifically type II restriction enzymes) are enzymes that recognise a specific short DNA sequence, bind to the DNA at that sequence and cut the the phosphodiester bonds in the backbone of the DNA molecule at that point. Molecular biologists use restriction enzymes to break DNA molecules at predictable points, either to then rearrange them into new sequences by re-joining them, or to get basic sequence information about the DNA molecule being digested.

There are many software packages that produce maps of the positions on a DNA sequence where restriction endonucleases will cut. The main focus of existing packages is to provide maps of restriction endonuclease recognition sites on single DNA sequences, rather than comparing the patterns of recognition sites on many sequences. The goal of Cleaver is to allow comparisons of the cut patterns of endonucleases in several DNA sequences at once; and to provide options for finding restriction patterns that are specific to DNA fragments isolated from given taxonomic groups. Cleaver also makes restriction maps, gives lists of restriction fragment sizes, helps to search for the most informative endonucleases to use in T-RFLP analysis etc...

Identification of biological materials by short DNA sequences contained in them has recently become very popular (Hebert et al., 2002; Tautz et al., 2003). A major international project, the 'consortium for the barcoding of life' (http://www.barcodinglife.org/) is attempting to sequence a single mitochondrial DNA region from as many organisms as possible. This is an example of a useful resource for sequence based DNA identification, but it requires access to sequencing facilities to obtain identifications. For more discussion of DNA based sequence information and some relevant software, go here (http://dnaid.sourceforge.net/). It is also possible to identify organisms from characteristic fragment sizes produced after restriction digestion of (generally) Polymerase Chanin Reaction (PCR) generated DNA fragments. Examples of procedures for identifying organisms by the sizes of restriction fragments produced by digesting PCR products are Wolf et al. (1999); Pfeiffer et al. (2004); Muraji et al. (2004); Chakraborty et al. (2005).

Another procedure that can benefit from having taxon specific information on restriction digestion patterns is the targeted digestion of PCR products from a restricted taxonomic group, but not from other taxonomic groups. This is useful for removing one component of complex pools of DNA derived from numerous sources. An example is the use of restriction endonucleases in analysis of PCR products derived from dietary samples like the stomach contents of animals (Blankenship & Yayanos, 2005). A restriction endonuclease can be used to cut DNA derived from a predator, but not DNA derived from prey items prior to the PCR step so that the predator DNA does not amplify. This is very useful if the prey DNA is the main fraction of interest and it allows the use of 'universal' PCR primers that target as many taxa as possible. Dietary DNA analysis using 'universal' primers is appealing because it allows a more general overview of animal diet that approaches using species-specific or group-specific PCR primers (Jarman et al., 2004). Cleaver has an option to search for endonucleases that cut a DNA fragment from some taxa, but not others.

Usage Overview

To use Cleaver, the user provides a set of sequences from an orthologous DNA region derived from a range of species of interest. The DNA can be stored either as a FASTA file or an INSDSeqXML file, which is a sequence format with lots of additional metadata including useful information like the full taxonomic affiliation of the organism from which a sequence was derived. It is used by GenBank, the EMBL Nucleotide Sequence Database and the DNA Database of Japan. For more details on this, see the Cleaver manual, available online at http://cleaver.sourceforge.net/Cleaver_Manual.html.

The interface for Cleaver has two sections. The top one displays a list of restriction endonucleases and their important features, namely the sequence which they recognise and cut, the length of the recognition site, what other enzymes also cut at this site, and what other enzymes produce cut sites compatible with the cut site produced by the enzyme. The bottom section displays a list of sequences, their length and any 'group' that the user wants to assign them to. For INSDSeq XML files, groups can be assigned to all sequences based on the taxonomic information provided for them in the INSDSeq XML file. For sequences taken from FASTA files, groups must be assigned manually. Other fields from INSDSeq XML files may also be displayed.

Implementation

Cleaver is written for Python 2.3 and utilises the Qt 3.3 graphics library (Trolltech) mediated by the PyQt wrapper (RiverBank computing). Cleaver is primarily developed on a Linux platform, but versions converted to run on MacOSX and Win32 platforms have been produced.

The software is released under the GNU General Public License version 2. This means that anyone is free to copy the software and to modify it or re-distribute it as long as the copyright notice is also included and credit for the original version is given.

Bug reports, suggestions or requests for features can be directed to simon.jarman@aad.gov.au.

All files for different versions of Cleaver (Cleaver_b04 (beta version 4) being the current version at 26^th may 2006) can be downloaded from Cleaver's sourceforge project page:

http://sourceforge.net/projects/cleaver

Windows

An installer package is available that allows simple point-and-click installation of Cleaver. The different versions of Cleaver for Windows will all be called something like 'Cleaver_Win32_b03.1_setup.zip.' Download this package, then:

Unzip the archive.
Double click on the file 'Cleaver_Win32_b03.1_setup.exe' and the installer program will lead you through the installation process.

This package was created using py2exe to transform the script version to an .exe and libraries. The installer was provided by Inno Setup. This package has no requirements for other packages to be pre-installed. The package contains example files. It must be located at the base of your C: drive (not in the 'Program Files' folder) to work correctly. The default values for the installer will put it in the right place.

MacOSX

A standalone package is available for MacOSX. This was created on a PPC Mac, so it may not work on newer Intel-based machines (although the author would be interested to have someone try). The different versions of Cleaver for MacOSX will all be called something like 'Cleaver_MacOSX_b04.zip.' Download this package, then:

Unzip the archive.
Place the folder wherever you like.
To launch cleaver, double-click on the file 'Cleaver_MacOSX_b04' If desired, you can create an alias for this file and place it in the 'Applications' folder, so that you can run Cleaver from there. However, the original file must be located within the folder it came in for the program to run properly.

Linux

Cleaver is primarily developed under Linux, so the python script for Cleaver is likely to be the least buggy and best maintained version, so if you have access to a Linux-type system, this is probably the best option. To run the script on Linux, you must have the following installed:

Python 2.3 ,which is included with most popular Linux distributions.

Qt 3.3, which is included as part of the KDE desktop. Other desktop environments may require it to be installed.

PyQt, which can be downloaded from here.

The Linux version of Cleaver can then be run by:

Extracting the downloaded archive and saving the resulting .py file in a handy location
Either opening it from your favourite Python IDE; or opening a terminal and typing:

python /home/simon/Cleaver_Linux.py ...or something similar, depending on the location of the file

Binary packages will be produced for Linux once the author works out how to do it.

References

Blankenship LE, Yayanos AA (2005) Universal primers and PCR of gut contents to study marine invertebrate diets. Molecular Ecology 14: 891-889.

Chakraborty A, Aranishi F, Iwatsuki V (2005) Molecular identification of hairtail species (Pisces: Trichiuridae) based on PCR-RFLP analysis of the mitochondrial 16S rRNA gene. Journal of Applied Genetics 46: 381 – 385.

Hebert PDN, Cywinska A, Ball SL, deWaard JR (2002) Biological identifications through DNA barcodes. Proceedings of the Royal Society of London B 270: 313-321.

Jarman SN, Deagle BE, Gales NJ (2004) Group-specific polymerase chain reaction for DNA-based analysis of species diversity and identity in dietary samples. Molecular Ecology 13: 1313-1322.

Muraji M, Kawasaki K, Shimizu T, Noda T (2004) Discrimination among Japanese species of the Orius Flower Bugs (Heteroptera: Anthocoridae) Based on PCR-RFLP of the Nuclear and Mitochondrial DNAs. Japan Agriculture Research Quarterly 38: 91-95.

Pfeiffer I, Burger J, Brenig B (2004) Diagnostic polymorphisms in the mitochondrial cytochrome b gene allow discrimination between cattle, sheep, goat, roe buck and deer by PCR-RFLP. BMC Genetics 5: 30.

Tautz D, Arctander P, Minelli A, Thomas RH, Vogler AP (2003) A plea for DNA taxonomy. Trends in Ecology and Evolution 18: 70-74.

Wolf C, Rentsch I, Hubner P (1999) PCR-RFLP analysis of mitochondrial DNA: a reliable method for species identification. Journal of Agriculture and Food Chemistry 47: 1350-1355.

Last updated 26^th May 2006