Starfish: Fast Capsid Structure Generation

In this blog we describe Starfish™, our method of generating viral capsid surfaces in a fast and accurate way. This is a more detailed look at the structural methods behind our ESGCT 2025 poster. Starfish™ allows us to screen structures in silico at a speed and scale not possible with existing tools such as Boltz2 and Alphafold3.
Lir is building nAAVigator®, a holistic AI pipeline to engineer viral vectors for gene therapy. It’s the in silico engine driving our lab-in-the-loop platform. We believe nAAVigator® is the path towards creating smarter, safer, more effective medicines. Our goal is to take viruses that exist in nature (such as AAV) and completely resurface them, enabling therapies that would be otherwise impossible. Structural predictions of the capsid surface is an important aspect of this work.

Starfish

Starfish™ is an AI and physics-based structure generation pipeline enabling fast, accurate and physically plausible capsid structure generation capable of large library screening tasks. Capsids are large, complex multimers which are poorly supported by the current generation of protein folding algorithms. Starfish™ rapidly identifies relevant binding surfaces for capsid multimers using cost-efficient hardware, enabling Lir to incorporate essential capsid structural analysis into large-scale viral variant library screenings.

The first stage of our pipeline involves applying a protein folding model to generate the monomers involved in the capsid surface. This is followed by fast physics-based assembly and refinement algorithms. Our method scales far better than multimer cofolding both in terms of runtime and compute requirements, with Starfish™ able to assemble the full capsid at acceptable runtimes for screening tens of thousands of variants. Our approach resolves unrealistic clashes often produced by AI-generated structures, while anchoring the resulting complex in physics-based models suitable for downstream tasks such as molecular docking.

All approaches predict monomers and trimers accurately (RMSD <1Å) for wild-type proteins. Alphafold2/Openfold scales poorly as the complexity of the surface portion increases, requiring over an hour to generate the 5-fold axis . Boltz2 is able to generate structures of the 3-fold axis in reasonable time but fails to generate a 5-fold axis, running out of memory (OOM) on a H100 80GB GPU. Starfish™ remains accurate and faster up to the full capsid with less compute.

Why?

We built Starfish™ because existing models like Boltz2 and Alphafold aren’t capable of modelling whole viruses at the throughput we need. 

Viral capsids are made up of multiple overlapping proteins that form the surface of the virus. Much of Lir's work involves mutating the surface of capsids to alter the properties of viruses for gene therapy. Existing protein folding models can compute the monomers of most proteins involved in viral capsids with high accuracy, but struggle to both quickly and accurately generate complete surfaces.

Compared to some other viruses, AAV has a relatively simple capsid made up for 60 subunits. Computing large portions of the capsid using models such as Alphafold or Boltz2 is impractical when scaling to a large number of targets due to both their time and memory requirements. Owing to its structural homogeneity, modelling the complete 60-mer AAV capsid is usually unnecessary, but portions of the surface such as at the 2, 3 and 5-fold axes are the minimum subunit for useful docking or structural assessment of capsid stability in silico.

All approaches generate the 3-fold axes of AAV9 wild-type with an RMSD <1Å, but Starfish™ provides a clear advantage in runtime and scalability. At the 5-fold axis Boltz2 exceeds the memory capacity of an 80GB H100 GPU to compute the structure making it cost-prohibitive for larger screening experiments. Alphafold2 Multimer eventually generates a pentameric structure but it is not the correct 5-fold axis. Alphafold3 had the same incorrect output structure in our tests. Our approach generates the correct structure efficiently, at an acceptable runtime for large screens and with no clashes for docking or analysis.

Applications

We first showcased an application of this technology at ESGCT 2025 in Seville, demonstrating that screening of the LY6A receptor was possible by performing docking to variant capsid surfaces constructed using Starfish™. We found a structural approach identified an alternative subset of hits compared to a language model-based approach, with the two methods being additive. We've been using this same approach for other receptors to investigate experimental targets predicted to exhibit differential binding. Multimer surfaces are essential to perform these larger screens, with no detectable signal present with the monomer only. For these applications Starfish™ is a crucial part of our pipeline, enabling rapid surface generation with minimal compute overheads.

We’ve found larger docking surfaces have been required to get relevant docking scores for screening purposes. Starfish is an important aspect of our work to both generate variants and add explainability to hits

Starfish™ is also being extended to other viruses in line with Lir's ambitions to build a state-of-the-art viral engineering toolkit for gene therapy. Combined with our DNA and Protein language models for viruses, Starfish™ allows us to consider viral engineering at multiple levels of analysis.  

Starfish™ is one part of Lir's nAAVigator® pipeline which is designed to accelerate viral vector design. Combined with our lab-in-the-loop, we’re building a platform to generate vectors specific to clinical partners’ targets at speed. We'll continue to post development updates here in the coming months. If this sounds relevant to your work please reach out!