Home » Software » CheckFrame

CheckFrame

Name

check_frame - Check sequencing reads of clones to ensure the open reading frame is in the correct frame and to identify the encoded protein sequence.

Description

check_frame Check sequencing reads of cloned genes to ensure the open reading frame is in the correct frame and to identify the encoded protein sequence. These clones consist of the cloning vector, a linker, and then the coding sequence of the protein to be synthesized. The coding sequence of the protein needs to be in the correct translation frame, relative to the translation start site encoded in the vector and/or linker sequence. Furthermore, the resulting (in-frame) translated sequence needs to be checked to ensure it encodes the intended protein.

check_frame requires two files as input - a fasta file of sequencing reads of the clones, and a fasta file of reference protein sequences of the indended products of the cloning experiments. The output is a tab-separated-vavlues (TSV) file providing either a) the defline of the sequencing read and the defline of the matching protein sequence(s) from the reference sequence database; b) a statement that either the coding sequence is out of frame or the protein's sequence could not be matched; or c) a statement that the linker (amino-acid) sequence could not be located in the sequencing read.

check_frame works by first locating the amino-acid translation of the end of the linker in each sequencing read, permitting amino-acid substitutions representing up to two nucleotide substitutions (configurable parameter, set below), and then searching the reference protein sequence database for the amino-acid sequence immediately following (in the same translation frame) the linker's matched amino-acid translation.

check_frame can either be run from the command-line or using its configuration interface - if no arguements are supplied, then the graphical user interface will automatically be used.

Download and Installation

Download the appropriate package from edwardslab.bmcb.georgetown.edu. For Windows, use the automated installer check_frame.win32.exe or zip file check_frame.win32.zip; for Linux select the appropriate tgz file check_frame.linux-i686.tgz or check_frame.linux-x86_64.tgz. Software is completely self-contained, unpack or install anywhere. Example sequencing reads and reference protein files are also included.

Synopsis

check_frame [ options ]

Options

These options can either be set using the configuration user-interface or on the command-line.

Sequencing Reads
-s sequencing-reads.fasta
--seq sequencing-reads.fasta

FASTA format file containing clone sequencing reads. Required.

Protein Sequences
-p protein-seqs.fasta
--prot protein-seqs.fasta

FASTA format file containing reference protien sequences. Required.

Output
-o output.tsv
--out output.tsv

Filename for tab-separated-values (TSV) program output. If not supplied output goes to standard out - only useful from the command-line.

Advanced

In-Frame Linker
-l linker-aa-seq
--linker linker-aa-seq

The amino-acid sequence of the linker, immediately prior to the start of the coding sequence. Default: STSLYKKAGSA.

DNA Mutations
-k nucleotide-subs
--linkmut nucleotide-subs

Permit amino-acid substitutions in the linker translation representing at most nucleotide-subs nucleotide substitutions. Default: 2.

Initial Prot. Amino-Acids
-l length
--len length

Length of the initial protein sequence to attempt to match against the reference protein sequence database. Default: 10.

Prot. Mutations
-K nucleotide-subs
--protmut nucleotide-subs

Permit amino-acid substitutions in the initial protein sequence match representing at most nucleotide-subs nucleotide substitutions. Default: 1.

Debug Output
-d
--debug

Output the alignment results and other debugging information to standard error. Default: False.

Example

Example of command-line use:
check_frame.exe -s seqreads.fasta -p sequences.fasta -o out.tsv
Result:
RAJG913T1Fplate1        gi|61660213|ref|YP_214104.1| Structural protein [Murid herpesvirus 1]
RAJG914T1Fplate1        gi|61660175|ref|YP_214066.1| hypothetical protein MuHV1_gpM53 [Murid herpesvirus 1]
RAJG915T1Fplate1        gi|61660175|ref|YP_214066.1| hypothetical protein MuHV1_gpM53 [Murid herpesvirus 1]
RAJG977T1Fplate1        gi|61660161|ref|YP_214052.1| hypothetical protein MuHV1_gpm40 [Murid herpesvirus 1]
RAJG978T1Fplate1        gi|61660161|ref|YP_214052.1| hypothetical protein MuHV1_gpm40 [Murid herpesvirus 1]
...
RAJG976T1Fplate1        Not in frame or not matched.
RAJG961T1Fplate1        Linker sequence not matched.
RAJG949T1Fplate1        Linker sequence not matched.

Author

Nathan Edwards