Morpheus

hosted by GitHub

Frequently Asked Questions (FAQ)

Where did the name "Morpheus" come from?
From the movie The Matrix, of course.

How can I generate mzML data for Morpheus?
The msconvert tool of ProteoWizard is your best bet, although most mzML should work regardless of what generated it. It works with pretty much any vendor data. However if you are using Thermo .raw files or Agilent .d directories, there are versions which support those formats directly which will remove a step of work. There will be small differences in the results because the vendor's charge state assignment and deisotoping routines are used instead of the generic ones in Morpheus.

Where can I get FASTA proteome databases?
They are available from many sources but I recommend UniProt, particularly the complete and reference proteome sets. After you filter to the appropriate proteins, click the orange 'Download' button near the top-right of the page. Under FASTA, click 'Download' for either the canonical sequence data or canonical and isoform sequence data. The former will be faster and result in simpler protein groups, but you might miss out on identifying an interesting isoform. I usually err on the side of databases that are too large rather than too small so my suggestion would be canonical and isoform sequence data.

What does the 'Create Target–Decoy Database On The Fly' checkbox do?
Target–decoy database searching is an empirical way to estimate how many false positives you have identified, enabling calculation of false discovery rate (FDR): false positives / (true positives + false posivites)—in other words, out of all the identifications you report, how many are incorrect. Target–decoy FDR is ubiquitious in mass spectrometry–based proteomics, and Morpheus is designed around the assumption that all users will employ this strategy. There are two ways to do so. The easiest is to supply a normal, target-only FASTA database. Alternatively, you can use an existing concatenated target–decoy FASTA database, provided the decoy entries contain the text 'DECOY_' somewhere in the identifiers. When you load your FASTA, Morpheus will do a quick survey to determine if it contains decoys. If not, it will automatically check the 'Create Target–Decoy Database On The Fly' checkbox. If it does contain decoys, it will uncheck it. You should only override this automatic selection if you really know what you're doing. Note: If your FASTA database does contain decoys, but they aren't recognized as such by Morpheus, you will probably end up with a very low number of identifications. This is because it will make decoys out of decoys, which often entails reversing the sequence twice, thus generating decoys that are actually the normal forward target sequence.

What does the 'Initiator Methionine Behavior' dropdown do?
Most proteins begin with a methionine residue. However, it is often cleaved off co-translationally. More details about this at UniProt. For each protein that starts with a methionine, Morpheus allows you to automatically search for the protein with it retained, cleaved off, or both ("variable").

What precursor and product mass tolerances should I use?
I suggest very conservative mass tolerances, i.e. 3 standard deviations or more. The reason is incorrectly matching a precursor or product due to having an excessively large mass tolerance can be easily overcome to still produce the correct match, while missing a precursor or product due to having an excessively small mass tolerance can be very harmful, often resulting in an incorrect match. This is easiest to think about in terms of precursor mass tolerance. If the correct match has a +6 ppm error, and your tolerance was ±5 ppm, you will definitely get an incorrect match. By searching ±20 ppm, for example, you expand your search space by about 4 times. However, if the correct peptide can't be matched among these number of candidates, the spectral quality is probably not sufficient to produce a reliable match regardless of the search settings. Overall, for product mass tolerance I recommend at least ±10 ppm or ±0.01 Da for all high-resolution instruments (orbitraps, ICRs, and TOFs). Of course, you should experiment with your data and see what produces the best results. Precursor mass tolerance is trickier; I recommend at least ±1.1 Da for reasons discussed in the next question.

What does "Precursor Monoisotopic Peak Correction" do?
A well-known issue in proteomics is "off-by-one" errors in precursor (and less commonly, product) masses. At higher masses, the monoisotopic peak becomes lower in abundance and thus more difficult to locate robustly. As a result, the determined precursor mass can often be off by integers of 1 Da (more accurately, ¹³C − ¹²C = 1.00335 Da. To account for this while keeping a narrow mass tolerance, this option allows you to search in narrow windows (e.g. ±20 ppm) around multiples of the expected mass error. You should test out this option, though for most part I find it simpler to just search a wide window, e.g. ±2.1 Da, and the results are similar.

What does "Maximum Variable Modification Isoforms per Peptide" control?
Imagine a peptide with 10 separate residues that could contain a single modification each. To consider each would be 2¹⁰=1024 isoforms. Depending on the variable modifications you allow, a single peptide can sometimes have thousands or more isoforms that occupy a large amount of the overall search time. To limit this, this setting allows you to consider only a certain number of isoforms for each peptide sequence. The default setting of 1024 corresponds to searching all isoforms with 0..9 modifications (assuming all on different residues), as 2⁰ + 2¹ + ... + 2⁹ = 1 + 2 + ... + 512 = 1023.

I get an error/exception that mentions out of memory. How can I fix this?
There are several things you can try, listed in order of increasing difficulty:

If you're running a .NET version of Morpheus on a 64-bit operating system, make sure you have .NET 4.5 or higher. These versions are able to store larger objects in memory.
Re-run the program with the 'Minimize Memory Usage' checkbox checked. This will prevent the program from remembering which peptides it has already searched, therefore reducing the memory usage. It will cause it to run slower, however.
Close other programs open on your computer while you run Morpheus.
Try searching a smaller database. For example, are you searching nr or all of SwissProt? Limit it to only the organism(s) of interest. Or are you searching UniProt (SwissProt + TrEMBL) for a specific organism? Maybe you only need to search SwissProt (manually annotated proteins).
Try on a different computer with more memory.
Add physical memory to your computer. Remember if you're running a 32-bit operating system, 4 GB is the most you can use.

What does the "Maximum FDR" control?
This dictates the peptide–spectrum match (PSM), distinct peptide, and protein group numbers shown in the log file. It does not filter the corresponding outputs, which always contain all hits. It is also used as a filter for determining which peptides are used for determining protein identifications.

What does "Consider Modified Forms as Unique Peptides" checkbox do?
When the software calculates unique peptides, it has to determine if peptides with the same sequence but different modifications/modification patterns are unique or not, e.g. SAMPLER versus SAM(oxidation of M)PLER. For the most part, you probably want to keep this unchecked, except possibly if you are looking specifically for PTMs and want to know how many isoforms you detected. However, note the software does not perform statistically rigorous localization of modifications, so e.g. PEPT(phosphorylation of T)IDES and PEPTIDES(phosphorylation of S) would be counted as two unique peptides with this option enabled, but the MS/MS spectra may not have evidence for both isoforms.

How should I view the PSM, distinct peptide, and protein group TSV outputs?
Microsoft Excel or most other spreadsheet programs. For the confident identifications, I recommend the filtering function to look at only rows with column Q-Value (%) <= 1, and also column Target? = TRUE so you don't bother looking at decoys.

Can I add/edit modifications, proteases, amino acids, etc.?
Yes. Nearly everything is user-configurable via tab-separated values (TSV) files located in the same directory as the executable. They should be straightforward to edit in Excel or other spreadsheet programs.

How can I compare the number of spectra and unique peptides matched to each protein across multiple analyses?
There is an auxiliary utility program called Morpheus Protein Summarizer that does this. Download the Windows executable here. The input is protein groups TSV files from Morpheus. The output is PSMs.tsv and unique_peptides.tsv, where each column is a dataset, each row is a protein, and the values are how many PSMs and unique peptides matched to that protein, respectively. Note that protein groups must be separated into their constituent proteins as protein groups may not be identical between datasets.

How can I compare the number of spectral counts for each unique peptide across multiple analyses?
There is an auxiliary utility program called Morpheus Peptide Summarizer that does this. Download the Windows executable here. The input is PSMs TSV files from Morpheus. The output is peptides.csv, where each column is a dataset, each row is a unique base peptide sequenc, and the values are how many spectra were matched to that peptide sequence.

Can I use Morpheus on Linux and Mac OS X?
Yes. Morpheus was developed with the Microsoft .NET Framework, which is a Windows-only technology. However, Mono is an open-source project to provide a cross-platform implementation of .NET. There is a Linux version of Morpheus that was compiled on Ubuntu 14.04.2 64-bit with Mono. You may be able to use this binary directly on Linux and Mac OS X. If not, you may be able to recompile it from the source code.

When I try to run Morpheus on Linux or Mac OS X, why do I get an error saying "cannot execute binary file" or "Microsoft Windows applications are not supported on OS X"?
Open a terminal, and instead of running Morpheus directly, run it with mono.exe. For example, instead of running ./Morpheus.exe, run mono Morpheus.exe.

Can I compile/build Morpheus from source code?
Yes. On Windows, Morpheus is available as a Microsoft Visual Studio 2015 solution. You can download Microsoft Visual Studio Community 2015 for free. On Linux and Mac OS X, Morpheus is available as a MonoDevelop/Xamarin solution. You can download the source code through the Git version control system using the repository URL https://github.com/cwenger/Morpheus.git , or download the latest version of the code directly from https://github.com/cwenger/Morpheus/archive/master.zip. All solutions should be able to be built without any modification required.