ORF Name: All available ORF names are included in the dropdown list. Some ORFs are not included. Refer to Support Info for further details. The corresponding Standard Gene Name is displayed at the top panel on the right side. Gene name is hyperlinked to its SGD page.
Search type - Motif: input pattern is matched with regular expression using the same syntax and semantics as Perl (see examples below). The motif must be fixed in length.
Common examples:
[ABC] - matches A or B or C;
[^ABC] - matches anything except A, B, or C;
. - matches anything once.
Please refer to this link and the detailed supporting information in Support Info tab for further details.
Search type - Position: input the position(s) of the first amino acids in the motif(s) and select the correct 'motif length' (between 1 to 10). Position is defined as amino acid position in the protein. Set 'motif length' to 1 if checking individual positions. In this case, insertion will be 0 for all positions.
Reference Strain: the strain you are interested at (used as reference) for the algorithm PholyZOOM. Refer to Support Info for further details.
Gap penalty: when applied decreases the score if there are non-standard amino acids at the alignment position (target site).
Sequence Map is plotted in the second panel on the right, with matched motif(s) or position(s) highlighted in yellow. '-' represents a gap in multi-sequence alignment (MSA) results, suggesting there is at least one other strain among the 1012 yeast strains that has an insertion in this position. MSA is performed with Clustal Omega 1.2.4. For further information about MSA, please refer to 'detailed supporting information' under Support Info.
Selected Site: display an interactive table showing stats for user selected amino acid position(s). Click on the yellow highlighted amino acids in the Sequence Plot above to check Symbol Frequency and Conservation Score for individual position. Click in between two amino acids to show both together for comparisons.
Click on Symbol Frequency and Conservation Score for the full table of corresponding stats.
All scores are normalized to: conserved -> 1; relaxed -> 0.
Click here to download Strain Name details.
Download Strain Name sheetAmino acids will be displayed by their one letter code . Click here to download the code table.
Download amino acid code tableSymbol Frequency for selected site
Conservation Scores for selected site
Refer to Support Info tab for details about each algorithm.
Download Conservation Scores Table
Conservation Score interpretation:
Insertion: calculate the percentage of amino acid insertion events within the motif among all strains in the database.
positions: proteinposition is the position of the amino acid in the protein (not counting '-', gap in multi-sequence alignment). MSAposition is the position of amino acid in the multi-sequence alignment (counting the '-').
SubMatrix: likeliness of existing substitution based on BLOSUM62.
ShannonE: diversity of the target site.
StereochemE: stereochemical property of the diversity.
JSDivergence: the similarity of diversity with BLOSUM62 frequency (relaxed).
PhyloZOOM: diversity weighted by evolutionary distance to reference strain.
Click here to download detailed supporting information
Download SupportingInfoIf you use CoSMoS.c., please cite this paper:
Shuang Li, Henrik G. Dohlman
Evolutionary conservation of sequence motifs at sites of protein modification
Journal of Biological Chemistry 2023
For source code, please refer to this github repository.
For questions or to report problems, please email:
shuang9@email.unc.edu; hdohlman@med.unc.edu.