====== CYP genes, P450 enzymes ====== what's in a name ? {{::omuratitle.jpg?200|}} CYP genes encoding P450 enzymes are consistently found among the largest gene families in plants, animals and fungi. These enzymes are at the interface of environmental responses, metabolism and endocrine regulation by catalyzing the transformation of a myriad of exogenous and endogenous substrates by hydroxylation, epoxidation, dealkylations and a great variety of other reactions. This dokuwiki has a focus on arthropod P450s. The name [[wp>Cytochrome_P450 | cytochrome P450]] dates back to the original paper of [[https://pubmed.ncbi.nlm.nih.gov/14482007/|Omura and Sato (1962) ]] characterizing the rabbit liver pigment with a prominent peak of its Fe(II)-CO complex at 450 nm as a hemoprotein (Omura and Sato, 1964a,b). This pigment had first been observed in rat and pig liver microsomes by Klingenberg (1958) and Garfinkel (1958). For a historical context see [[https://doi.org/10.2183/pjab.87.617| Omura, 2011]]. Elements of a chronology of arthropod P450 research have been collected [[P450 history|here]]. It turned out that cytochromes P450 are not [[wp>Cytochrome |cytochromes]] in the strict biochemical sense, but heme-thiolate proteins ([[https://www.sciencedirect.com/science/article/pii/S0742841398100269?via%3Dihub|Mansuy, 1998]]). The term cytochrome has remained, although it is now easier to just refer to "P450 enzymes". Many papers refer to cytochrome P450 [[monooxygenase|monooxygenases]], even though the reaction P450 enzymes catalyze is not always a monooxygenation. Also, P450s should not be confused with flavin monooxygenases [[fmo|(FMOs)]]. With gene cloning and later genome sequencing, the number of sequences increased rapidly and the CYP nomenclature was adopted in the late 1980s for genes encoding P450 enzymes. Soon, CYP and P450 became almost interchangeable, and many papers now refer to "CYPs" rather than "P450s". It doesn't really matter, as long as the designation is precise. Here, we prefer to use P450 in general, and to use CYP when referring to a specific gene/protein. ====== P450 nomenclature ====== {{ :fignomenclature.png?400|}} A **CYP** prefix, followed by an arabic numeral designates the **family** (all members nominally share >40% sequence identity), a capital letter designates the **subfamily** (all members nominally >55% identical) and an arabic numeral designates the individual **gene** (all italics) or transcript and protein (no italics). (A termite P450 claimed the welcoming designation of CYP4U2 [[https://www.ncbi.nlm.nih.gov/nuccore/AF046011.1/|AF046011]]). CYP names are given by the "P450 nomenclature committee", which currently means Dr. David R. Nelson (University of Tennessee) ([[drnelson@gmail.com| email]]) to whom requests for official CYP names should be directed. As of April 17, 2023, David Nelson has named 22,877 insect P450s in 1043 CYP families. **//Publishing about a P450 without official CYP name, or worse, with an approximate, wrong or invented name only causes confusion in the literature. //** It leads to errors in the interpretation of results, these errors are then compounded in later publications, and are a waste of research time. WARNING: NCBI is an excellent resource, but **a P450 (CYP) result in a [[https://blast.ncbi.nlm.nih.gov/Blast.cgi|BLAST]] search at NCBI should only be considered a starting point** (for sequence) and indication of relatedness to known P450s: (1) **CYP name:** A CYP name as found at NCBI is in most cases NOT an official CYP name. In fact, when NCBI names P450s as "cytochrome P450 nNm-**//like//**" or "**//probable//** cytochrome P450 nNm", this shows that the CYPnNm name should not be used in publications, as this leads to confusion. The correct name should be found by blast on this site or by asking Dr. Nelson for a correct name ([[drnelson@gmail.com| email]]). “The beginning of wisdom is to call things by their right name” (名正才能言順, Confucius, The Analects, [[https://ctext.org/analects/zi-lu|Zi Lu 3]]). (2) **Sequence:** Sequences as found at NCBI are often correct, but in too many cases they are not. Problems with sequences at NCBI are discussed [[Problems at NCBI|here]]. =====P450 sequences===== Collections of manually annotated [[arthropod_p450_sequences|arthropod P450 sequences]] with their official CYP names are provided [[arthropod P450 sequences|here]]. There are currently very few confirmed examples of [[AltSplicing|alternatively spliced]] arthropod P450 transcripts, although cases are well described for [[http://dx.doi.org/10.1124/dmd.116.073254|human P450s]]. Similarly, [[gene_conversion|gene conversion]] events have not been studied systematically in arthropods, despite the many gene clusters that would be the "breeding grounds" for such events. Notable examples are listed [[gene_conversion|here]]. [[CNV|Copy number variation (CNV)]] is increasingly appreciated as an important phenomenon at the intraspecific level (e.g. Lucas et al., 2019), which is mirrored at the level of interspecific comparisons by gene duplications. Currently there is no standard way of naming copy number variants of P450 genes. [[CYPfam_by_clan|CYPome size]], or number of CYP genes in a genome, is a number commonly referred to in the literature, but it is not an absolute or fixed number. It is affected by the number of pseudogenes, the number of alternatively spliced genes, and by intraspecific [[CNV|copy number variation]]. Moreover, it is also dependent on the quality of the genome assembly (with the not infrequent inclusion of two alleles of the same gene), and the quality of the annotation. [[background_nomenclature|More on P450 nomenclature]] ===== Higher order nomenclature: CYP clans ===== [[CYP clans|CYP clans]] constitute a higher order of nomenclature, regrouping CYP families. Until recently insect CYPomes were thought to be comprised of sequences from just four clans, the CYP2, CYP3, CYP4 and mitochondrial clans (Feyereisen 2006, 2012). Recent work established the presence of P450s from additional clans, the CYP16 and CYP20 clans, and pointed to the intriguing presence of possible CYP19 clan sequences, as well as a CYP53 clan sequence in the fungus gnat //Bradysia// (acquired from fungi by horizontal gene transfer). More about [[CYP clans|CYP clans]] ===== Pseudogenes and gene fragments ===== Pseudogenes are noted by the suffix P. This suffix is added to the closest paralog that is an active gene, e.g. //CYP9E2// and //CYP9E2**P1**// in //Blattella germanica// (Wen et al., 2001). However this is not always done, as the closest paralog is sometimes not easily recognized. In that case, the pseudogene has its own gene root number. Pseudogenes can differ by a single nucleotide (substitution or indel) leading to a premature stop or a frameshift ("young" pseudogenes). They can also be so degraded as to be hardly recognized as P450 genes ("old" pseudogenes). Sequencing of different populations or strains of the same species can reveal that a pseudogene in one population is an active gene in another, and vice-versa (e.g. CNV study in //Tetranychus urticae// in prep.). A nomenclature for loose exons (solo exons, detritus exons), or internal duplicated or partial exons has been proposed (Nelson et al., 2004) but these rules are too cumbersome and are not in common use. ===== Alleles ===== Alleles of a gene are named as subscripts v1, v2 (e.g. //CYP6B1v2//, Cohen et al., 1992). In a practical shortcut, //CYP6CM1vQ// and //vB// designate the alleles of this gene found in the Q and B biotypes, (now MED and MEAM) respectively, of //Bemisia tabaci// (Karunker et al., 2008). In this case, the alleles differ in their capacity to metabolize neonicotinoids. v1, v2 subscripts are not very common, and while they are found in the literature, they should be treated with caution. They may be associated with genes cloned before genome projects, so they may represent different genes that are very close in sequence, rather than (as intended) alleles of a single gene. Copy number variation ([[CNV|CNV]]) also results in (often duplicated) genes that are close in sequence. Allelic variants of human P450s can be responsible for interindividual variation in drug metabolism and there is a dedicated [[https://www.pharmvar.org/genes|website]] to document them and support research in pharmacogenetics. ===== P450 common names ==== In following the tradition that predates the CYP nomenclature, P450 enzymes can be named with a small suffix, such as P450cam, the camphor hydroxylase of //Pseudomonas putida// later named CYP101; P450BM3 the fatty acid hydroxylase of //Bacillus megaterium// (CYP102); or P450scc, the cholesterol side-chain cleavage enzyme of vertebrates (CYP11A1). In insects, few P450 enzymes have been named in this way. P450hyd [[https://www.pnas.org/content/91/21/10000|Reed et al., 1994 ]]is the P450 forming hydrocarbons, later identified as CYP4G2 in the house fly (Qiu et al., 2012). P450Lpr is the predominant P450 in the pyrethroid-resistant strain Learn-Pyr of the house fly, later identified as CYP6D1 (Tomita and Scott, 1995). In the Drosophila gene nomenclature only the initial letter is capitalized, hence //CYP6A1// in the house fly and //Cyp6a2// in //Drosophila melanogaster//. Several of the so-called "[[ecdysteroid metabolism|Halloween genes]]", originally from the Drosophila literature, are particular cases where the use of common (actually rather uncommon) names obscures the identity of the genes as P450s in the molting hormone biosynthetic pathway.