Transmembrane Domains

Predicting transmembrane helices began with hydropathy analysis. A sliding window, often nineteen residues wide, averaged per-residue hydrophobicity along the sequence using a scale such as Kyte-Doolittle or GES, and segments whose mean exceeded a fixed threshold were called as candidate membrane-spanning regions. The method was cheap and transparent, but it conflated buried hydrophobic cores of soluble proteins with genuine bilayer-spanning helices, mispredicted signal peptides, and offered no information about orientation. Topology had to be inferred separately, usually through the positive-inside rule, which exploits the enrichment of arginine and lysine in cytoplasmic loops.

Hidden Markov models reduced these errors by encoding membrane architecture as states—helix core, cap, inside loop, outside loop—and decoding the most probable path with the Viterbi algorithm. TMHMM and HMMTOP both took this approach. The cost was training data: few membrane protein structures had been solved, since the proteins resisted crystallization, so the models learned from small and partly redundant sets.

A flat, educational illustration of a transmembrane protein passing through a lipid bilayer.