check_frame - Check sequencing reads of clones to ensure the open reading frame is in the correct frame and to identify the encoded protein sequence.
check_frame Check sequencing reads of cloned genes to ensure the open reading frame is in the correct frame and to identify the encoded protein sequence. These clones consist of the cloning vector, a linker, and then the coding sequence of the protein to be synthesized. The coding sequence of the protein needs to be in the correct translation frame, relative to the translation start site encoded in the vector and/or linker sequence. Furthermore, the resulting (in-frame) translated sequence needs to be checked to ensure it encodes the intended protein.
check_frame requires two files as input - a fasta file of sequencing reads of the clones, and a fasta file of reference protein sequences of the indended products of the cloning experiments. The output is a tab-separated-vavlues (TSV) file providing either a) the defline of the sequencing read and the defline of the matching protein sequence(s) from the reference sequence database; b) a statement that either the coding sequence is out of frame or the protein's sequence could not be matched; or c) a statement that the linker (amino-acid) sequence could not be located in the sequencing read.
check_frame works by first locating the amino-acid translation of the end of the linker in each sequencing read, permitting amino-acid substitutions representing up to two nucleotide substitutions (configurable parameter, set below), and then searching the reference protein sequence database for the amino-acid sequence immediately following (in the same translation frame) the linker's matched amino-acid translation.
check_frame can either be run from the command-line or using its configuration interface - if no arguements are supplied, then the graphical user interface will automatically be used.
Download and Installation
Download the appropriate package from edwardslab.bmcb.georgetown.edu. For Windows, use the automated installer
check_frame.win32.exeor zip file
check_frame.win32.zip; for Linux select the appropriate tgz file
check_frame.linux-x86_64.tgz. Software is completely self-contained, unpack or install anywhere. Example sequencing reads and reference protein files are also included.
check_frame [ options ]
These options can either be set using the configuration user-interface or on the command-line.
FASTA format file containing clone sequencing reads. Required.
FASTA format file containing reference protien sequences. Required.
Filename for tab-separated-values (TSV) program output. If not supplied output goes to standard out - only useful from the command-line.
The amino-acid sequence of the linker, immediately prior to the start of the coding sequence. Default: STSLYKKAGSA.
Permit amino-acid substitutions in the linker translation representing at most nucleotide-subs nucleotide substitutions. Default: 2.
Initial Prot. Amino-Acids
Length of the initial protein sequence to attempt to match against the reference protein sequence database. Default: 10.
Permit amino-acid substitutions in the initial protein sequence match representing at most nucleotide-subs nucleotide substitutions. Default: 1.
Output the alignment results and other debugging information to standard error. Default: False.
Example of command-line use:check_frame.exe -s seqreads.fasta -p sequences.fasta -o out.tsvResult:RAJG913T1Fplate1 gi|61660213|ref|YP_214104.1| Structural protein [Murid herpesvirus 1] RAJG914T1Fplate1 gi|61660175|ref|YP_214066.1| hypothetical protein MuHV1_gpM53 [Murid herpesvirus 1] RAJG915T1Fplate1 gi|61660175|ref|YP_214066.1| hypothetical protein MuHV1_gpM53 [Murid herpesvirus 1] RAJG977T1Fplate1 gi|61660161|ref|YP_214052.1| hypothetical protein MuHV1_gpm40 [Murid herpesvirus 1] RAJG978T1Fplate1 gi|61660161|ref|YP_214052.1| hypothetical protein MuHV1_gpm40 [Murid herpesvirus 1] ... RAJG976T1Fplate1 Not in frame or not matched. RAJG961T1Fplate1 Linker sequence not matched. RAJG949T1Fplate1 Linker sequence not matched.