Background Suppression subtractive hybridization is a popular technique for gene discovery

Background Suppression subtractive hybridization is a popular technique for gene discovery from non-model organisms without an annotated genome sequence, such as cowpea (Vigna unguiculata (L. and (ii) to select clones for sequencing based on the calculation of enrichment ratios with associated statistics. Enrichment ratio 3 values for each clone showed that 62% of the forward library and 34% of the reverse library clones were significantly differentially expressed by drought stress (adjusted p value < 0.05). Enrichment ratio 2 calculations showed that > 88% of the clones in both libraries were derived from rare transcripts in the original tester samples, thus supporting the notion that suppression subtractive hybridization enriches for rare transcripts. A set of 118 clones were chosen for sequencing, and drought-induced cowpea genes were identified, the most interesting encoding a late embryogenesis abundant Lea5 protein, a glutathione S-transferase, a thaumatin, a universal stress protein, and a wound induced protein. A lipid transfer protein and several components of photosynthesis were down-regulated by the drought stress. Reverse transcriptase quantitative PCR confirmed the enrichment ratio values for the selected cowpea genes. SSHdb, a web-accessible database, was developed to manage the clone sequences and combine the SSHscreen data with sequence annotations derived from BLAST and Blast2GO. The self-BLAST function within SSHdb grouped redundant clones together and illustrated that this SSHscreen plots are a useful tool for choosing anonymous clones for sequencing, since redundant clones cluster together around the enrichment ratio plots. Conclusions We developed the SSHscreen-SSHdb software pipeline, which greatly facilitates gene discovery using suppression subtractive hybridization by improving the selection of clones for sequencing after screening the library on a small number of microarrays. Annotation of the sequence information and collaboration was further enhanced through a web-based SSHdb database, and we illustrated this through identification of drought responsive genes from cowpea, which can now be investigated in gene function studies. SSH is a popular and powerful gene discovery tool, and therefore this pipeline will have application for gene discovery in any biological system, particularly non-model organisms. SSHscreen 2.0.1 and a link to SSHdb are available from http://microarray.up.ac.za/SSHscreen. Background A range of techniques are available for gene discovery. Expressed sequence tag (EST) sequencing of cloned cDNAs is usually a common approach with the advantage that Proglumide sodium salt supplier if full-length cDNAs are cloned they can be directly employed for further gene function experiments [1]. Cloned cDNAs can be arrayed on high-density microarrays and used for expression profiling [2]. Next generation sequencing, such as 454 technology?, has been employed for sequencing cDNA libraries [3], and the term RNA-Seq has been dubbed for this approach when Proglumide sodium salt supplier applied at deep enough coverage to compare transcript counts between one or more biological states [4]. Previous methods, such as serial analysis of gene expression (SAGE), are also based on counting short sequence tags [5]. Although these methods provided outstanding quantitative analysis, they are labour-intensive and currently very Proglumide sodium salt supplier costly. Additionally, they are most effective if an annotated genome sequence is available. Many research laboratories that are investigating non-model crops C-FMS without genome sequence resources or have research questions that do not require a full genome analysis have the option of applying different “RNA fingerprinting” techniques for gene discovery. Examples of these techniques are differential display RT-PCR (DD-RT-PCR), RNA-fingerprinting by arbitrarily primed PCR (RAP-PCR) and cDNA amplified fragment length polymorphism (cDNA-AFLP) where cDNA sub populations are amplified and visualized on polyacrylamide gels, whereafter differentially expressed transcripts are isolated from the gel for sequencing [6-8]. These methods have limitations such as bias based on choice of initial primer sets, problems with reproducibility, generation of false positives, and reliance on time-consuming polyacrylamide gel electrophoresis and gel extraction to obtain sequence information. Another limitation of the above methods is the difficulty to capture low abundance clones. A third option for gene discovery are PCR-based cDNA subtractive hybridization methods. These methods exclude common cDNA sequences between the two or more samples and, thus enrich for target sequences of interest, which are subsequently Proglumide sodium salt supplier cloned. These methods include representational difference analysis (RDA) and.