-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update benchmarking scaffold #256
Conversation
Your free trial has expired. To keep using Ellipsis, sign up at https://app.ellipsis.dev for $20/seat/month or reach us at help@ellipsis.dev |
PR Reviewer Guide 🔍
|
PR Code Suggestions ✨
|
CI Failure Feedback 🧐(Checks updated until commit 19fe761)
✨ CI feedback usage guide:The CI feedback tool (
In addition to being automatically triggered, the tool can also be invoked manually by commenting on a PR:
where Configuration options
See more information about the |
b0592f9
to
b60dd4d
Compare
b60dd4d
to
44d7f29
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
PR Type
Enhancement, Tests
Description
ComposioToolSet
to improve workspace handling and logging.run_and_get_scores
function for running agents and retrieving scores.run
function inrun_evaluation.py
to accept anagent_func
parameter.run_evaluation.py
.benchmark.template
file.run_benchmark.template
for running benchmark evaluations.Changes walkthrough ��
toolset.py
Refactor workspace handling and logging in `ComposioToolSet`
python/composio/tools/toolset.py
workspace_id
andworkspace_env
attributes from theconstructor.
set_workspace_id
method to set and retrieve workspace.run_evaluation.py
Refactor benchmark evaluation and logging setup
python/swe/benchmark/run_evaluation.py
run_and_get_scores
function to run agent and get scores.run
function to acceptagent_func
as a parameter.get_logger
.benchmark.template
Remove obsolete benchmark template
python/swe/composio_swe/scaffold/templates/crewai/benchmark.template
benchmark.template
file.run_benchmark.template
Add new benchmark template for running evaluations
python/swe/composio_swe/scaffold/templates/crewai/run_benchmark.template
run_benchmark.template
file.agent_func
for workspace setup and issue kickoff.main
function to run benchmark.