Table of Contents
More on P450 nomenclature
Origins
A nomenclature of P450 genes and proteins was introduced when only 65 sequences were known ( Nebert et al., 1987). Gene families were initially designated by Roman numerals (e.g. CYPVIA1), but the proliferation of diverse sequences rapidly became discouraging, even to those versed in classics. The rules of nomenclature were then revised to their current form, using arabic numerals (e.g. CYP6A1)(Nebert et al., 1991, Nelson et al., 1993, Nelson et al., 1996).
Bending the rules
The identity (%) rules for family (40%) and subfamily (55%) designations are not strictly adhered to, but names once adopted are rarely changed.
Naming genes by the lumper mode or splitter mode: Initially, many insect P450s were arbitrarily lumped into the CYP6 and the CYP4 families even though they had less than 40% amino acid identity with CYP6A1 or with vertebrate CYP4 proteins. This was before anyone could have guessed how large the P450 superfamily would turn out to be.
Naming genes in the lumper mode made the CYP6 and CYP4 families the largest ones in insects by a cascade effect. Papilio polyxenes CYP6B1 is only 32% identical to Musca domestica CYP6A1 (Cohen et al., 1992), so placing it in the CYP6 family “forced” many subsequent sequences into that family even if they did not meet the 40% criterion.
The splitter mode prevailed at the completion of genome projects, which led to a new proliferation of CYP families in insects.
The confusing designations “CYP450”, “P4506G1” or other such variations should be avoided.
Different P450 enzymes are generally products of different genes, they are not isozymes.
A few documented cases of alternative splicing gives rise to different P450 isoforms.
The CYP nomenclature is a TOOL for easy communication, so it does not have to be perfect, just useful. To quote David Nelson : “The genome is a map. The more street signs the better.”
Alternate classifications
Pfam has a family called p450 (pf00067). As of this writing (Pfam35.0, Nov 2021), it covers 213,489 sequences from 5,038 species. A sequence tagged with pf00067 is almost certainly a P450 sequence.
In contrast and for some obscure reason, some classifications of gene families such as Interpro are using and spreading numbers and tags that are not very clear and mostly do not reflect evolutionary relationships, thus clouding understanding. Here are some Interpro entries that can (randomly, it seems) be found:
H (homologous superfamily type) IPR036396 Short name: Cyt_P450_sf with overlapping entries :
F (family type) IPR001128 Short name: Cyt_P450. This probably overlaps and/or includes:
F (family type) Cytochrome P450, B-class (IPR002397) these are most bacterial, mitochondrial and some fungal P450s
F (family type) Cytochrome P450, mitochondrial (IPR002399) overlaps somewhat with the one above
F (family type) Cytochrome P450, E-class, group I (IPR002401) covers vertebrate Clan 2 P450s as well as plant CYP71s…
F (family type Cytochrome P450, E-class, group II (IPR002402) now this one is a real winner, lumping together bacterial, fungal, vertebrate as well as insect CYP4 and CYP6 sequences…
and 23 more family types…
GO terms are also tagged to P450s, ranging from obvious to wrong. Caveat emptor. As P450s fall in different GO term categories, “GO term enrichment” can be a lottery.
Conserved protein domain family cd00302 and cl41757 at NCBI covers P450s, and has many subfamilies that seem to closely match P450 phylogeny.