Submit New Model Evaluation Results

Please upload the JSON file with model evaluation results and fill in the following information. If you have any questions, please contact us at jiayi_sheng@berkeley.edu or lupantech@gmail.com.

Split *

Required: Select which dataset split to evaluate on

โ„น๏ธ
OpenAI API Key Required for Evaluation
โ€ข A valid OpenAI API key (Tier 2+ with $30+ budget) is required for LLM judge evaluation. We do not save or store your API key - it's only used during evaluation.
โ€ข You can revoke or deactivate your key 30 minutes after evaluation completion. The evaluation process typically costs around $25 and 25 minutes depending on your submission size.
Model Type *

Select the type of your model

Model Source *

Select whether the model is proprietary or open-source

Live Evaluation Status

Overall Acc
--
Answer Acc
--
Step Acc (No Toy Case)
--
Step Acc (No Logical Gap)
--
Step Acc (No Approximation Error)
--
Step Acc (No Computation Error)
--

Required JSON Structure:

Your JSON file must include at least these 5 fields for each problem:

[
    {
        "data_id": [integer or string] The ID of the test data,
        "problem": [string] The question text,
        "type": [string] The type of question: 'relation' or 'bound',
        "prompt": [string] The prompt used for the problem,
        "response": [string] The response of the model
    },
    ...
]

You can click the download button below to get an example file. The system will process your submission and calculate accuracy metrics automatically.