YeastMotifConserv

CoSMoS.c. - Conserved Sequence Motif in Saccharomyces cerevisiae

This algorithm can be used to search for and identify motifs conserved in the 1002 yeast genome project.

Single Gene
Paralogs

ORF Name: All available ORF names are included in the dropdown list. Some ORFs are not included. Refer to Support Info for further details. The corresponding Standard Gene Name is displayed at the top panel on the right side. Gene name is hyperlinked to its SGD page.

Search type - Motif: input pattern is matched with regular expression using the same syntax and semantics as Perl (see examples below). The motif must be fixed in length.

Common examples:

[ABC] - matches A or B or C;

[^ABC] - matches anything except A, B, or C;

. - matches anything once.

Please refer to this link and the detailed supporting information in Support Info tab for further details.

Reference Strain: the strain you are interested at (used as reference) for the algorithm PholyZOOM. Refer to Support Info for further details.

Gap penalty: when applied decreases the score if there are non-standard amino acids at the alignment position (target site).

Sequence Map is plotted in the second panel on the right, with matched motif(s) or position(s) highlighted in yellow. '-' represents a gap in multi-sequence alignment (MSA) results, suggesting there is at least one other strain among the 1012 yeast strains that has an insertion in this position. MSA is performed with Clustal Omega 1.2.4. For further information about MSA, please refer to 'detailed supporting information' under Support Info.

Selected Site: display an interactive table showing stats for user selected amino acid position(s). Click on the yellow highlighted amino acids in the Sequence Plot above to check Symbol Frequency and Conservation Score for individual position. Click in between two amino acids to show both together for comparisons.

Click on Symbol Frequency and Conservation Score for the full table of corresponding stats.

All scores are normalized to: conserved -> 1; relaxed -> 0.

Click here to download Strain Name details.

Download Strain Name sheet

Amino acids will be displayed by their one letter code . Click here to download the code table.

Download amino acid code table

Symbol Frequency for selected site

Conservation Scores for selected site

Download Symbol Frequency table

Refer to Support Info tab for details about each algorithm.

Download Conservation Scores Table

Conservation Score interpretation:

Insertion: calculate the percentage of amino acid insertion events within the motif among all strains in the database.

positions: proteinposition is the position of the amino acid in the protein (not counting '-', gap in multi-sequence alignment). MSAposition is the position of amino acid in the multi-sequence alignment (counting the '-').

SubMatrix: likeliness of existing substitution based on BLOSUM62.

ShannonE: diversity of the target site.

StereochemE: stereochemical property of the diversity.

JSDivergence: the similarity of diversity with BLOSUM62 frequency (relaxed).

PhyloZOOM: diversity weighted by evolutionary distance to reference strain.

Click here to download detailed supporting information

Download SupportingInfo

If you use CoSMoS.c., please cite this paper:

Shuang Li, Henrik G. Dohlman
Evolutionary conservation of sequence motifs at sites of protein modification
Journal of Biological Chemistry 2023

For source code, please refer to this github repository.

For questions or to report problems, please email:

shuang9@email.unc.edu; hdohlman@med.unc.edu.

Paralog ORF Names: input pair of paralog ORF names with '_' in between. The sequence of the two ORFs matters for the following analysis. Refer to the paragraphs below about Search Types for details. All pairs of ORF names are included in the dropdown list. Some paralog pairs are not included. Refer to Support Info for further details. Their corresponding Standard Gene Names are displayed at the top panel on the right side. Gene names are hyperlinked to their SGD pages.

Search type - Motif: input pattern is matched with regular expression using the same syntax and semantics as Perl (see examples below). The motif must be fixed in length. For Paralogs analysis, the motif(s) are matched for both ORFs. Additionally, for any motif match that exists in only one ORF, its corresponding sites in the other ORF are also included for analysis. Input ORF name sequence does not affect Motif search results.

Common examples:

[ABC] - matches A or B or C;

[^ABC] - matches anything except A, B, or C;

. - matches anything once.

Please refer to this link and the detailed supporting information in Support Info tab for further details.

Search type - Position: input the position(s) of the first amino acids in the motif(s) and select the correct 'motif length' (between 1 to 10). Position is defined as amino acid position in the protein. Set 'motif length' to 1 if checking individual positions. In this case, insertion will be 0 for all positions. For Paralogs analysis, the position(s) are searched against the first input ORF. Then the corresponding position(s) in the second ORF are also included for analysis. The order of ORF names affect Position search results.

Gap penalty: when applied decreases the score if there are non-standard amino acids at the alignment position (target site).

Sequence Map is plotted in the second panel on the right. Paralog sequences are plotted against each other with Needleman–Wunsch global alignment. '-' represents a gap in the alignment. ORF names are noted at the beginning of the sequences. Amino acid positions in the proteins (not counting '-') are noted above or below each sequence. Matched motif(s) or position(s) are highlighted in yellow. The Needleman–Wunsch alignment is performed with pairwiseAlignment from the Biostrings package in R.

Click on Symbol Frequency and Conservation Score for the full table of corresponding stats. Results are arranged in a way that for each matched position, the two paralog sites are juxtaposed for better comparisions.

All scores are normalized to: conserved -> 1; relaxed -> 0.

Amino acids will be displayed by their one letter code . Click here to download the code table.

Download amino acid code table

Symbol Frequency for selected site

Conservation Scores for selected site

Download Symbol Frequency table

Refer to Support Info tab for details about each algorithm.

Download Conservation Scores Table

Conservation Score interpretation:

Insertion: calculate the percentage of amino acid insertion events within the motif among all strains in the database.

SubMatrix: likeliness of existing substitution based on BLOSUM62.

ShannonE: diversity of the target site.

StereochemE: stereochemical property of the diversity.

JSDivergence: the similarity of diversity with BLOSUM62 frequency (relaxed).

PhyloZOOM: diversity weighted by evolutionary distance to reference strain.

Click here to download detailed supporting information

Download SupportingInfo

If you use CoSMoS.c., please cite this paper:

Shuang Li, Henrik G. Dohlman
Evolutionary conservation of sequence motifs at sites of protein modification
Journal of Biological Chemistry 2023

For source code, please refer to this github repository.

For questions or to report problems, please email:

shuang9@email.unc.edu; hdohlman@med.unc.edu.