Welcome to the IneqMath Evaluation Platform!
๐ Project |
arxiv |
๐ค HF Paper |
Code |
๐ค Dataset |
๐ Leaderboard |
๐ฎ Visualization
Submit New Model Evaluation Results
Please upload the JSON file with model evaluation results and fill in the following information. If you have any questions, please contact us at jiayi_sheng@berkeley.edu or lupantech@gmail.com.
Required: Select which dataset split to evaluate on
โข You can revoke or deactivate your key 30 minutes after evaluation completion. The evaluation process typically costs around $25 and 25 minutes depending on your submission size.
Select the type of your model
Select whether the model is proprietary or open-source
Live Evaluation Status
Required JSON Structure:
Your JSON file must include at least these 5 fields for each problem:
[
{
"data_id": [integer or string] The ID of the test data,
"problem": [string] The question text,
"type": [string] The type of question: 'relation' or 'bound',
"prompt": [string] The prompt used for the problem,
"response": [string] The response of the model
},
...
]
You can click the download button below to get an example file. The system will process your submission and calculate accuracy metrics automatically.
History Submissions
Please enter your email to check the results of your submission within 24 hours.
|
ID
|
Status
|
Model
|
Size
|
Type
|
Source
|
Date
|
Submission time
|
Overall Acc
|
Answer Acc
|
Step Acc
(NTC)
|
Step Acc
(NLG)
|
Step Acc
(NAE)
|
Step Acc
(NCE)
|
|---|
Status Explanation:
- Processing: Your submission is currently being evaluated by us. This may take several minutes to complete.
- Completed: Evaluation is finished and results are ready.
Step Accuracy Abbreviations:
- NTC: No Toy Case - Step accuracy excluding using toy-case for general conclusions
- NLG: No Logical Gap - Step accuracy without logical reasoning gaps
- NAE: No Approximation Error - Step accuracy excluding approximation errors
- NCE: No Calculation Error - Step accuracy excluding all calculation errors