GLiDe User Documentation

GLiDe is a web-based tool specifically created for the genome-scale design of sgRNA libraries tailored for CRISPRi screening in prokaryotic organisms. It includes an integrated database that encompasses 1,426 prokaryotic organisms and allows users to upload reference files. Additional information can be found in two papers (Nat Commun 2018 and biorxiv 2023). Please cite these papers or subsequent peer-reviewed publication if this program is useful to your work.

In the following sections, we present a step-by-step user manual for GLiDe, along with guidance on parameter configuration. Furthermore, you can readily access essential information on the software's web pages by simply hovering your mouse cursor over the input boxes.

Input: Default Mode

In default mode, users need to choose an organism from the built-in database, after which a form will be displayed for inputting parameters.

An example of organism selection and input form

In this form, users are required to select the target chromosome and plasmid (they can press Ctrl or Command to select multiple fragments) and configure the desired parameters. A description of all available parameters is provided below:

  1. Design Target: This specifies the target for the designed sgRNA library, which can be chosen from either the coding sequence (CDS) or RNA coding genes (RNA).
  2. Off-target Threshold: The penalty score is employed for quality control of sgRNAs. GLiDe employs the seed region principle and penalty scoring metrics (see our paper for detail) to evaluate off-targets. An off-target site is identified when the penalty score is less than the user-defined threshold, considering that mismatches are generally better tolerated at the 5′ end than at the 3′ end. Three regions are categorized based on their proximity to the PAM sequence.
  3. scoring metrics
    Schematic diagram of the calculation of penalty scores

  4. GC Limits: The upper and lower bounds of the GC content of each sgRNA (in percentage). Various studies have reported that high or low GC content cases (<20% or >80%) would lead to low target cleavage efficiency.
  5. Spacer Length: The length of the designed sgRNA. When efficacy is a major concern, must ≥ 12.
  6. Target Strand: Refers to the targeted strand of the genome, which can be selected from either the template or non-template strand. In the case of common CRISPRi systems, the non-template strand is preferred for effective gene silencing. However, we offer the alternative choice for applications that do not have a strand preference.
  7. NC Design (optional): When activated, GLiDe will generate sgRNAs that do not have any significant targets across the genome. These sgRNAs can be used as negative controls to assess the impact of experimental procedures. The user can define the total number of negative control sgRNAs.
  8. Email report (optional): If selected yes, the result would be sent to provided email address.

Input: Optional Mode

A second mode, the optional mode, is provided for an organism not exists in database of users who want to design sgRNA library in pre-selected regions. In addition to the parameters mentioned above, two additional standard files are required:

  1. Sequence File:
    • Sequence file should be single contig genome file (must in FASTA format, .fna or .fasta file eg.). In addition to the four bases (A, C, G and T), GLiDe also accepts mix-bases symbol (R, Y, M, K, S, W, H, B, V, D and N), other characters, like I (Hypoxanthine) or U (Uracil) are not accepted. Both upper and lower cases are acceptable.
    • The header of the sequences should include accession number and name, separated by space and leading by the ">" symbol. An example of a header structure is: ">NZ_LR881938.1 Escherichia coli str. K-12 substr. MG1655 strain K-12 chromosome MG1655, complete sequence". If the annotation file is in .ptt or .rnt format, the sequence file should only contain a single sequence contig with its corresponding header. However, if the annotation file is in .gff or .gff3 format, the sequence file can contain multiple sequence contigs, and their headers should have accession numbers that match those in the annotation file.
  2. Annotation File:
    • Regarding the annotation file format, GLiDe accepts the General Feature Format (GFF/GFF3 eg.), as well as two older versions: the protein table file (PTT eg.) and the RNA table file (RNT eg.). When using the PTT or RNT formats, it's important to ensure that the "Design Target" parameter aligns with the file type uploaded (choose "CDS" for PTT format and "RNA" for RNT format).
    • The annotation file can be customized for tailored library design, allowing users to delete specific sequences from standard files in order to design an sgRNA library for selected regions.
    • If a user only has the sequence file and lacks an annotation file, they can obtain one through a standard genome annotation pipeline such as the Prokaryotic Genome Annotation Pipeline (PGAP).

Output 1: Guide Library List

The main output of gLiDe is a list containing all sgRNAs (in Excel format). In the list sgRNAs are classified with their targeted genes and labeled with their start positions.

sgRNA library list example
An example of the final sgRNA library list

  • For each gene, sgRNAs are ranked by their distance to the start codon, those closer to transcription start sites are at the top. Genes are sorted in their natural order.
  • Negative control sgRNAs (if designed) would be placed in the last row of the list.
  • This table is temporarily saved on the server, user has to download the Excel form for inspection.

Output 2: Genome Map

The second output of GLiDe is the genome map. It would be displayed in a graphical interactive page, which is built with the D3GB genome browser (J Comput Biol 2017).

interactive example
An example of the interactive page

  • The whole genome and all sgRNAs are displayed on a zoomable page so that users can use the scroll bar at the top to zoom in for inspecting the region of interest or search for a specific locus.
  • Genes and designed sgRNAs are displayed at their appropriate locations in the genome, all genes and sgRNAs are clickable, and a new window would be displayed containing more detailed information like the sequence, position, and targeted gene (for sgRNAs)
  • User can send the URL of the genome map page to collaborators or bookmark it. We have not yet deleted any results over the last two years and try to keep the output for at least a year. For long-term archival, consider downloading the page and/or the Excel format of guide library list.

Output 3: Off-target Log List

This list contains sgRNAs predicted to have potential off-target hits. All unselected sgRNAs with their predicted off-target sequences would be listed inside. This table is also temporarily saved on the server, user has to download the Excel form for inspection.

log example
An example of the off-target log list