Genome-wide prediction of disease variant effects with a deep protein language model 'A Model that predects bad genetic variants'

Here we implemented a workflow generalizing ESM1b to protein sequences of any length and used it to predict all ~450 million possible missense variant effects across all 42,336 protein isoforms in the human genome.

The Nucleotide Transformer: Building and Evaluating Robust Foundation Models for Human Genomics A quality set of JAX-enabled transformer models for use in downstream uses.

They use 6mer tokenization and embeddings. Non-commercial license. Github image



!!! abstract "GitHub Repo stars Doctor GPT implements advanced LLM prompting for organizing, indexing and discussing PDFs, and does so without using any type of opinionated prompt processing frameworks "ā€œ