Embedding Analysis

CLAMP: Predicting Specific Protein-Mediated Chromatin Loops in Diverse Species with a Language Model of Chromatin Accessibility
About This Tool

This tool allows you to generate high-quality embeddings for DNA sequences using our pretrained model. The embeddings can be used for various downstream tasks such as classification, clustering, and sequence analysis.

Due to limited online server resources, CLAMP+ only provides 500bp-4mer pretrained CLAMP model and supports up to 3 DNA sequences per submission. If you need to use models with more parameters, please deploy 1500bp-6mer CLAMP model on your server/cluster.

According to our test, the anlysis time is about 15 seconds per sequence.

Tutorial

Step-by-Step Usage Guide:

  1. Input Preparation:
    • Direct input: Paste DNA sequences in FASTA format (><sequence_id> followed by sequence)
    • File upload: FASTA file containing 1-3 sequences (Max 500bp each)
  2. Sequence Requirements:
    • DNA sequences only (A/T/C/G characters)
    • Maximum length: 500 base pairs per sequence
  3. Submission: Click the blue submit button
  4. Results: Download embedding vectors from result page

Example FASTA Format:

>seq1
AGGTGATCAAAGAAAACTCTTAGACTTCATCTTTCGCCAAGCACAAGGTCTCTTTTGGGAAAAGTGAGCTCTTTTGCCACCTTGTGACACTGGATGAGAACAGCAAGCCCTCAGATCAATTCCTACTCCTGCTCCAAGGTGGCAGCATTGTACCGTGGCATCTAGGAACTAATGTACAGAGAGTTTCAAATAGAGCAACAGGAGGGAAGGTGATAATACCTCAGGGGAGCAGAGAATCATGT
>seq2
CAATCAATCAGATTCAAAAACCAAACAGAACTGTTGTATAAACTAGTTCATACAACAATTCTGTCTGATATTTGAATCTGATTCCCTGATAGCCTTGCGATATTTTGCCAGTAACATCACACAAAGGCTTCTTCCCCCTTCACTGTGTATTCCTCCGCCCACATACAGCCAGCAACAAGCAGAGACGCTACAGTTTGAC

Supported file format: FASTA (.fa, .fasta)

We will notify you once your job is completed.

Need Help?

If you encounter any issues or have questions about using this tool, please refer to our documentation or contact our support team.