PyNISTPL
Introduction
PyNISTPL is a python module that interfaces to the NIST MS Search Engine libraries to provide easy to use command-line searching and construction of peptide spectrum libraries.
PyNISTPL includes python scripts that can be run from the Windows or Cygwin command prompt that read many common peptide MS/MS spectral formats, including mzXML, mzData, MGF, etc.
Installation
There are three installation options currently available.
- Binary distribution
- Requires no Python installation. Executables and NIST DLLs (by permission) contained in a single zip file, with no external depedencies (other than the NIST spectral libraries). Use this unless you want to modify the python scripts.
- Python package distribution
- Requires Python 2.4 installation. Python scripts and NIST DLLs supplied via Windows installers. Python scripts can be modified, in their installation directory. Various support packages must also be installed. Binary interface to the NIST DLLs cannot be modified. Use this if you want to write new python scripts that interface to the NIST DLLs.
- Source distribution
- Requires Python installation. Presumes NIST library is available for compilation which must be obtained from NIST as necessary. Requires Visual C++ for compilation. Various support packages must also be installed. Binary interface to the NIST DLLs can be modified at will. Use this if you want to write a new python API to the NIST DLL.
Binary distribution
- Download the
pymssearch.zip
file from edwardslab.bmcb.georgetown.edu. - Unzip the
pymssearch.zip
file wherever you like. - Note that the python modules assume the peptide libraries are in the default location:
C:\NIST_PEPLIB\LIBS
.
Python package distribution
- If necessary, download and install the latest version of the ActiveState
ActivePython (Python version 2.4 only!) package for Windows. The
default installation location (
C:\Python24
) will require the least tweaking of paths. - Download and install the python elementtree package from effbot.org.
- Download and install the python cjson package from edwardslab.bmcb.georgetown.edu.
- Download and install the python PyMSIO package from edwardslab.bmcb.georgetown.edu.
- Download and install the python PyNISTPL package from edwardslab.bmcb.georgetown.edu.
- Scripts are installed in
C:\Python24\Scripts
. - Note that the python modules assume the peptide libraries are in the default location:
C:\NIST_PEPLIB\LIBS
.
Source distribution
- If necessary, download and install the latest version of the ActiveState
ActivePython (Python version 2.4 only!) package for Windows. The
default installation location (
C:\Python24
) will require the least tweaking of paths. - Download and install the python elementtree package from effbot.org.
- Download and install the python cjson package from edwardslab.bmcb.georgetown.edu.
- Download the source for the PyMSIO package from edwardslab.bmcb.georgetown.edu.
- Unzip the
PyMSIO-0.9.zip
file, and run the following command inside the PyMSIO-0.9 folder (adjust paths as necessary):c:\Python24\python.exe setup.py install
- Obtain the file
NIST_PEP_SRCH_ENGINE.zip
from NIST, and unzip toC:\
. - Download the source for the PyNISTPL package from edwardslab.bmcb.georgetown.edu.
- Unzip the
pynistpeplib-0.95.zip
file, and run the following command inside the pynistpeplib-0.95 folder (adjust paths as necessary) from the Visual Studio Command Prompt:C:\Python24\python.exe setup.py install
- Copy the NIST DLLs
NISTDL32.dll
andctNt66.dll
from C:\NIST_PEPLIB\SRCH_ENGINE\CODE to C:\Python24. - Note that the python modules assume the peptide libraries are in the default location:
C:\NIST_PEPLIB\LIBS
.
Usage
PyNISTPL provides three scripts that can be run from the Windows or Cygwin command-line. pymssearch searches a peptide spectrum library, such as the human peptide spectrum library from NIST; pymakelib makes a spectrum library from a set of input spectra; and pylibspectra outputs the number of spectra in a given spectrum library.
- pymssearch [ options ] spectrum-files
-
Options:
- -L library, --lib library
- Name, or full path, of spectrum library to search. Required.
- -o file, --output file
- Send output to designated file. If the filename ends in .gz, then output is automatically gzipped. Default: - (stdout).
- -s sim, --similarity_threshold sim
- Minimum similarity of hits to output. Default: All.
- -n hits, --number hits
- Maximum number of spectral hits to output. Default: All.
- -q, --quiet
- Only output query spectra with at least 1 hit. Default: False.
- -N peaks, --npeaks peaks
- Lower bound on number of peaks for query spectra. Default: 0.
- -t tol, --precursor_tolerance tol
- Precursor tolerance, in Daltons, for library spectra to match with. Default: 2 Da.
- -f tol, --fragment_tolerance tol
- Fragment tolerance, in Daltons. Default: 2 Da.
- -E, --EOMSSA_weighting
- Use OMSSA E-value to weight match factor. Default: False.
- -e, --output_eomssa
- Display OMSSA E-value. Default: False.
- -Q, --QTOF_weighting
- Use QTOF weighting in match factor. Default: False.
- -R, --NumRep_weighting
- Use number of replicates weighting in match factor. Default: False.
- -T, --output_tf
- Display Query and Lib T/F scores. Default: False.
- -P, --no_precursor_filter
- Whether or not to restrict the precursors of library spectra to match with. Default: False.
- --qformat fmt
- Format for query spectrum output. To see valid fields, use --qformat "-". Default: ">%(base)s %(scan)d %(title)s\n".
- --rformat fmt
- Format for reference spectrum output. To see valid fields, use --rformat "-". Default: "%(rank)d. %(peptide)s/%(charge)d ID %(libid)d Similarity %(sim).3f RevSim %(revsim).3f Probability %(prob).4f Mods %(Mods)s Spec %(Spec)s\n".
- -W work-directory, --workdir work-directory
- Search engine working directory. According to the NIST documentation, concurrent search engines processes should have different work directories. Default: C:\NIST_PEPLIB.
- -h, --help
- Help on command line options.
- pymakelib [ options ] spectrum-files
-
Options:
-
- -L library, --lib library
- Name, or full path, of spectrum library create or add to. If not supplied, and there is only one spectrum file on the command line, then the spectrum library name is inferred from the spectrum filename.
- -N name, --name name
- Spectra in the library
are identified by their input spectrum filename and their index within
the file. The option sets the name that is used to identify the set of
spectra.
- -W work-directory, --workdir work-directory
- Search engine working directory. According to the NIST documentation, concurrent search engines processes should have different work directories. Default: C:\NIST_PEPLIB.
- -v, --verbose
- Provide feedback on whether or not the library is created or already exists, and the number of spectra contained in it before and after the input spectra are added.
- -h, --help
- Help on command line options.
- pylibspectra [ options ]
-
Options:
-
- -L library, --lib library
- Name, or full path, of spectrum library create or add to. If not supplied, and there is only one spectrum file on the command line, then the spectrum library name is inferred from the spectrum filename.
- -W work-directory, --workdir work-directory
- Search engine working directory. According to the NIST documentation, concurrent search engines processes should have different work directories. Default: C:\NIST_PEPLIB.
- -h, --help
- Help on command line options.
Examples
Search a gzip'ed mzXML spectrum file of MS/MS spectra against the NIST human peptide spectrum library:
C:\> \Python24\Scripts\pymssearch -L human spectra.mzXML.gz > spectra_v_human.out
Compute all-vs-all spectral similarity for spectra in MGF format:
C:\> \Python24\Scripts\pymakelib -L myspectra -v spectra.mgf
C:\> \Python24\Scripts\pylibspectra -L myspectra
C:\> \Python24\Scripts\pymssearch -L myspectra spectra.mgf > spectra_v_myspectra.out