The CYP19 clan is named after CYP19A1, the vertebrate steroid aromatase, a highly specialized enzyme. Although it was thought that CYP19 was vertebrate-specific, CYP19-like sequences have now been found in arthropods as well. The genomic CYP19-like sequence from Bradysia coprophila and Bradysia odoriphaga are intronless and appear to result from a horizontal gene transfer (HGT) from Collembola (Feyereisen et al., 2023). CYP19 sequences are found also in some Cnidarians and Lophotrochozoa.
Key residues of the vertebrate CYP19 enzyme are conserved in the arthropod CYP19-like sequences as shown below:
For reference, human CYP19A1:
>CYP19A1-human-NP_00009400094 MVLEMLNPIHYNITSIVPEAMPAATMPVLLLTGLFLLVWNYEGTSSIPGPGYCMGIGPLISHGRFLWMGIGSACNYYNRVYGEFMRVWISGEETLIISKSSSMFHIMKHNHYSSRFGSKLGLQCIGMHEKGIIFNNNPELWKTTRPFFMKALSGPGLVRMVTVCAESLKTHLDRLEEVTNESGYVDVLTLLRRVMLDTSNTLFLRIPLDESAIVVKIQGYFDAWQALLIKPDIFFKISWLYKKYEKSVKDLKDAIEVLIAEKRRRISTEEKLEECMDFATELILAEKRGDLTRENVNQCILEMLIAAPDTMSVSLFFMLFLIAKHPNVEEAIIKEIQTVIGERDIKIDDIQKLKVMENFIYESMRYQPVVDLVMRKALEDDVIDGYPVKKGTNIILNIGRMHRLEFFPKPNEFTLENFAKNVPYRYFQPFGFGPRGCAGKYIAMVMMKAILVTLLRRFHVKTLQGQCVESIQKIHDLSLHPDETKNMLEMIFTPRNSDRCLEH
The following sequences are from arthropods:
Insecta-Diptera-Sciaroidea
>Bradysia_coprophila-CYP4724A1 MLRHWVNMLVHWFVYYVRPHILGFISSREVSLADTFRCNNNIKNRTWRIPELCKYLYEKYGKLVHLHLLGQRVLIVTDFKIASSLMTNHGHCLRQRFGNQFGLQNLNMHLTGLIWNNNVQQWKQNRNVFESTLKCGTKHLEALALCHLETLEKTTLKIAEPVTMLELLRNFTLSMTMEGLFEIPMPIDAQDKYDWREEAKTTVANYFKAWEFFLLNPNSQSLAENNLHQSACRKMLLFAKEIFAESEEIGSSNFISHLSHCNDNNGIFQCIAEMLLAGTDTSSITMFYTLLFLADFPEWSNKASRIIHSESQEAIDKEINHLYYESMRLIPVGPVILRQCEHDIEDGNGVSLRRGDGIIFNISGMNRANYDQPEKFNPNRYETECFPLSFGVGKKSCVGKTFAEREMKLFFTWFLKRYIVLGHQAEMISLLETRWDVANAPVQDIKLTIFPRKLIYFIGDHGTGKSSVMDAFEAAYPRIKIIRKEQVLTRQRQEKCSIQQDHPENGDLQSAVLDAHTQILNTFKSELVLIESSILDCLIKAKGKGKTCHYNKIELLRPMKKCLAVYFPMFEENQHGKQGTNKGYFDIMERAGCKSHTLKARSVSGRYEEVMNFIRKCSS
>Bradysia_odoriphaga-CYP4724A1 MLRHWVNMLVHWFIYYICPHILGFISSGGVSLADKFRCNNNIKSRTWRIPELCKYLYEKYGKLVHLHLLGQRVLIVTDFKIASSLMTNHGHCLRERFGSQFGLQNLNMHLTGLIWNNNVQQWKQNRNVFESTLKTGTKNLEALALCHLETLEKTTLKIAEPVTMLELLRNFTLSMTMEGLFEIPMPIDAQHTYDWREEAKTTVANYFKAWEFFLLNPNSQSLAENNLHQSACRKMLLFAKKIFAESQEMGSSNFISNLSHCNDNNGIFQCIAEMLLAGTDTSSITMFYTLLFLADFPEWSNKASRIINSESQEAIDKEITHLYYESMRLIPVGPVILRQCEHDIEDGDGVSLRRGDGIIFNISGMNRANYDQPEKFNPNRYETECFPLSFGVGKKSCVGKTFAEREMKLFFTWFLKRYIVLGHQAEMIGLLETRWDVANAPVQDIKLTIFPRKLIYFIGDHGTGKSSVMDAFKAAYPKIKIIRKEQVLTNHRQEKCSIQQDHLENGDLQSEFLNAHTQILNTFKSELVLIESSILDCLINAKGKVRTCDETELLRPMKKCLTVYFPTLEETQHGKQGTNKDYFDIMERAGCLSHSLKARSVSGRYEEVIDFIKKCSS
Insecta-Zygentoma-Lepismatidae: Assembled from several overlapping transcripts:GHEH01002688.1-GASN02026094.1- GHEH01003951.1-GASN02024379.1
>Thermobia_domestica GPPYLGGIGSLVVFARYLWLGIPEATAYYLNKYGDTVKVWIAGEETLITSKSTIVHHVLKVNGFRYTARFGNNTGLQHLGMYHNGIIWNNDVKLWKILRAYFQKALNATTLSEAVSVSVDASKRLLQKISELRSRTDDSSIEALNFIRRITLAVTNSLMLRVNISDDEDLVQKIVDYFKAWEYFLIRPPLLYSCSRAFTKHKESVQALQEAVRQIVSNKKKILFNQNKNMDDPNLDFAEHLLFSAERGEISEEQVHQCILEMLLAGTDTSSVTLYYLLVALSENPDIEKAILKELWETLGSKDISKNDLGKLVTMEKAIKESMRIKPVGPVIMRKALENDQISGLEIRKGMNIILHVAKMHTREDIFPNPFNFLPEEHFAKNLEQEYYPFGAGPKGCVGQFLAMVEMKAILATLLRKIAFKSKNQMLKTMKTRWDIAQQPTEPTYMFFEERNEMKI
Insecta-Zygentoma-Lepidotrichidae:(TSA)
>Tricholepidion_gertschi-GASO02049581 RTHAGAMEMWISTSCIALLAIIPLFISWLLLRRSKKGELVDIPGPSYLWGLGPYIVFARYLWMGIPEATAFYLLKYGDLVRVWIGGEQTIITSRPAVVHHVLRINGPAYTARFGNDRGLDHLGMFENGIIWNNDIPLWRILRGFFQKDEDLVERIVEYFKAWEYFLIRPPLLYKMSSSYKKHITAIQRLKDAVTDLVARKKLKMKRTGRSPTDPDCDFAEQLIFSAQQGAIKDEHICQCILEMLLAGTDTSSVTIYYLLVALSENKLVEMAVLKELSEMLEENDITKNDVPKLVTLERAIKESMRIKPVGPVIMRKALESDKVDTISIPQDTNVILHVAQMHRREEIFPQADRFLPEDHFDGKIFTKEYYPFGAGPKGCVGQFLAMVEMKAILATLLREIHFRSAGQTLQEMRTRWDIAQQPTEPTFMFFDTRKEKYSLTTS
Hexapoda-Collembola
>Folsomia_candida-4724B1-CYP19clan MVLFPALIILVVSLLFGYVKYFRKSRGSKYFASFLQFLFHYSPQVNPFKTSSTKRINFVDNFICNYNVRHRFWMIPDLCQHISKKYGKLAELDLFGQRIVVVTDHKISNAFMTKHTKRLRQRFGNEGGLKKLNMLHTGLIWNNDVAAWKNNRAIFESTLANGARMLGELSDSYFTSTLSPSIKVGQPNSILTLLRNYTVTMTMEGLFGLDSCKSPSWRDNVKATVGNYFKAWEFFLLTPNTEQLEEEERHKKFCHAITSLSRQILASAKTTNQSKFIEDLALVKANDENQIVQCITEMLLAGTDTSSLTMFYTSLLLTDNPQVGNQLSKLINDSSTPESNTIEQMISNLYYESMRLVSVGPVILRQAEDNISLEGPDFNLKLNKGDGIVFNIAGQNLNEDPFTCAKKFDPNRYDTGGESYPLTFGTGSKSCVGKPFAEREMKLFFSWFLKKWTLLGTTEDVMGLLKTRWDIANAPLTDLKLYIFPRIGVHFIGNCETVLDTVKEFRMQFPRLQCIKLNGFDDVELAEVWEGSKNKFVIIYGGSVQEYGKVKVPDNDLIIFFENCEGEMDRECGKDANFLKYRVEKVKYNNEGGNNVRNVVDIITTRVAV
Crustacea-Copepoda-Gymnoplea-Calanoida:(TSA)
>Rhincalanus_gigas-GIVD01040536.1 MRVLQTIMTLMMMIPRGLLMIPWVSEIMFQILGIPVASRLYLNWCGDLVKVIVGGRPTIITSRFNVARHVFRGNGFNYTSRMALDSGLDMIGMLDQGIIWNNKTKDWKAMRHNFQTALGNKTLEIVEDETSAAVKMAKDTFRGGAIRNGGRLDMLAFLRLVTLEITNRLMLGVEMDNRTEVVEAVVDYFKSWEFFLIRPRILYLTSPLAFFRNRRAVQRLNDLTGDIVRKKEQEMDGADNINDVTNNFLTNLILEKRQGKISHENVVQSVLEMLIAGTDTSSVTLFYTFVALAGNKDWEENVHSEVSEVDLMAPLPSLPVTDAVLKESMRIKPVGPVVLRRSLQGDIIDELKVEAGDNIIISLEEMHKREDLFNSPEMFNPQRFLEGEKTDQGFLPFGTGPKGCVGQFLAMREMRTVMVILIKEFKLRLVAGEELGSLKVRWDIAQQPVDQIIMELIRRDVTNQDV
Crustacea-Copepoda-Gymnoplea-Calanoida:Assembled from two overlapping transcripts: GFUD01100495.1-GFUD01078968.1
>Neocalanus_flemingeri LSRVSEVLFQLLGIPVAAKFYLSQFGDTVVVRVAGRRTLVTSSFRVSWRVFKENGFNYTS RMAEDCGLERIGMLNQGIIWNNRSKDWKKLRQYFQAAVNSRSLDMVMKHTYDSVDLVMSI HPIFQQGGGELDLLNFLRTVTLEVTNKLMFSVSMENRADLIEAIIGYF