Difference between revisions of "HMM PD00128"
(→How the HMM is built) |
|||
(4 intermediate revisions by 2 users not shown) | |||
Line 5: | Line 5: | ||
'''Name''': CDC25, N-Terminal Domain | '''Name''': CDC25, N-Terminal Domain | ||
− | === Why | + | === Why build the HMM? === |
− | The CDC25_NTD HMM profile is related to Pfam profile [http://pfam.xfam.org/family/PF06617.9 M-inducer_phosp (PF06617)], which is the regulatory N-terminal domain of [[Phosphatase_Subfamily_CDC25|CDC25s]]. The Pfam profile is able to detect CDC25_NTD domains in deuterostome CDC25s and basal eumetazoa ''Nematostella vectensis''. However, no CDC25_NTD domain is found in ecdysozoa CDC25s or ''Amphimedon queenslandica'' by Pfam profile. Using our in-house profile, we are able to detect the CDC25_NTD domain in insect CDC25s (e.g. string and twine in ''Drosophila melanogaster'') and CDC25s in | + | The CDC25_NTD HMM profile is related to Pfam profile [http://pfam.xfam.org/family/PF06617.9 M-inducer_phosp (PF06617)], which is the regulatory N-terminal domain of [[Phosphatase_Subfamily_CDC25|CDC25s]]. The Pfam profile is able to detect CDC25_NTD domains in deuterostome CDC25s and basal eumetazoa ''Nematostella vectensis''. However, no CDC25_NTD domain is found in ecdysozoa CDC25s or ''Amphimedon queenslandica'' by Pfam profile. Using our in-house profile, we are able to detect the CDC25_NTD domain in insect CDC25s (e.g. string and twine in ''Drosophila melanogaster'') and CDC25s in nematodes of the order Trichocephalida. CDC25_NTD was not detectable in Caenorhabditis species by either HMM profile or sequence similarity to nematodes with this domain, suggesting loss or extreme divergence of the domain. |
=== How the HMM is built === | === How the HMM is built === | ||
− | We searched the Pfam domains of ''Drosophila | + | We searched the Pfam domains of ''Drosophila melanogaster'' string, one of the two fly CDC25s, human CDC25A and ''Nematostella vectensis'' CDC25 via Pfam server. Based upon the position of CDC25 phosphatase domain and the length of the linking region between phosphatase domain and CDC25_NTD/M-inducer_phosp domain, we guessed the region of 1-300 of ''Drosophila melanogaster'' string may contain the CDC25_NTD/M-inducer_phosp domain. We then PSI-BLASTed the region against NR database via NCBI BLAST server. |
− | We downloaded the sequences. Then, we i) removed redundant sequences above 70% identity using CD-HIT program (parameter -c 0.7, other parameters default), ii) aligned the sequences by MUSCLE program (default parameters), iii) manually curated the alignment by removing low quality columns, iv) removed the sequences shorter than the length of the shortest 25% sequences, v) realigned the sequences by MUSCLE program, vi) manually curated the alignment, vii) build HMM profile using HMMBUILD program (default parameters), viii) validate the HMM profile that whether it was able to detect the domain from protein phosphatases such as ''Drosophila | + | We downloaded the sequences. Then, we i) removed redundant sequences above 70% identity using CD-HIT program (parameter -c 0.7, other parameters default), ii) aligned the sequences by MUSCLE program (default parameters), iii) manually curated the alignment by removing low quality columns, iv) removed the sequences shorter than the length of the shortest 25% sequences, v) realigned the sequences by MUSCLE program, vi) manually curated the alignment, vii) build HMM profile using HMMBUILD program (default parameters), viii) validate the HMM profile that whether it was able to detect the domain from protein phosphatases such as ''Drosophila melanogaster'' string and ''C. elegans'' CDC25s. We found a hit in ''Amphimedon queenslandica'' CDC25 (conditional E-value 0.00018, HMM profile coverage 76% (188/247 aa)). The best hit in ''C. elegans'' has a conditional E-value of 0.0049, HMM profile coverage 40% (97/247 aa), which is too weak to be regarded as a complete CDC25_NTD domain. |
We also carried out PSI-BLAST using the CDC25_NTD/M-inducer_phosp domain of ''Nematostella vectensis'' CDC25. We found putative CDC25_NTD/M-inducer_phosp domain in Trichocephalida order. We confirmed the domain by comparing it with our in-house CDC25_NTD HMM profile. We further PSI-BLASTed the CDC25_NTD domain of ''Trichocephalida spiralis'' CDC25, but did not find any hit in any Caenorhabditis species. | We also carried out PSI-BLAST using the CDC25_NTD/M-inducer_phosp domain of ''Nematostella vectensis'' CDC25. We found putative CDC25_NTD/M-inducer_phosp domain in Trichocephalida order. We confirmed the domain by comparing it with our in-house CDC25_NTD HMM profile. We further PSI-BLASTed the CDC25_NTD domain of ''Trichocephalida spiralis'' CDC25, but did not find any hit in any Caenorhabditis species. |
Latest revision as of 05:19, 17 November 2015
Back to List of HMMs
Symbol: CDC25_NTD
Name: CDC25, N-Terminal Domain
Why build the HMM?
The CDC25_NTD HMM profile is related to Pfam profile M-inducer_phosp (PF06617), which is the regulatory N-terminal domain of CDC25s. The Pfam profile is able to detect CDC25_NTD domains in deuterostome CDC25s and basal eumetazoa Nematostella vectensis. However, no CDC25_NTD domain is found in ecdysozoa CDC25s or Amphimedon queenslandica by Pfam profile. Using our in-house profile, we are able to detect the CDC25_NTD domain in insect CDC25s (e.g. string and twine in Drosophila melanogaster) and CDC25s in nematodes of the order Trichocephalida. CDC25_NTD was not detectable in Caenorhabditis species by either HMM profile or sequence similarity to nematodes with this domain, suggesting loss or extreme divergence of the domain.
How the HMM is built
We searched the Pfam domains of Drosophila melanogaster string, one of the two fly CDC25s, human CDC25A and Nematostella vectensis CDC25 via Pfam server. Based upon the position of CDC25 phosphatase domain and the length of the linking region between phosphatase domain and CDC25_NTD/M-inducer_phosp domain, we guessed the region of 1-300 of Drosophila melanogaster string may contain the CDC25_NTD/M-inducer_phosp domain. We then PSI-BLASTed the region against NR database via NCBI BLAST server.
We downloaded the sequences. Then, we i) removed redundant sequences above 70% identity using CD-HIT program (parameter -c 0.7, other parameters default), ii) aligned the sequences by MUSCLE program (default parameters), iii) manually curated the alignment by removing low quality columns, iv) removed the sequences shorter than the length of the shortest 25% sequences, v) realigned the sequences by MUSCLE program, vi) manually curated the alignment, vii) build HMM profile using HMMBUILD program (default parameters), viii) validate the HMM profile that whether it was able to detect the domain from protein phosphatases such as Drosophila melanogaster string and C. elegans CDC25s. We found a hit in Amphimedon queenslandica CDC25 (conditional E-value 0.00018, HMM profile coverage 76% (188/247 aa)). The best hit in C. elegans has a conditional E-value of 0.0049, HMM profile coverage 40% (97/247 aa), which is too weak to be regarded as a complete CDC25_NTD domain.
We also carried out PSI-BLAST using the CDC25_NTD/M-inducer_phosp domain of Nematostella vectensis CDC25. We found putative CDC25_NTD/M-inducer_phosp domain in Trichocephalida order. We confirmed the domain by comparing it with our in-house CDC25_NTD HMM profile. We further PSI-BLASTed the CDC25_NTD domain of Trichocephalida spiralis CDC25, but did not find any hit in any Caenorhabditis species.