Building a Coding Agent to Solve SWE-Bench
Learn how we improved our approach to solving SWE-bench problems by flipping the process—making code changes first and then generating patches.
In our first attempt to solve SWE-bench problems, we ran into a lot of issues because the patches were being created before the actual fixes were applied by an LLM. This approach caused problems like inconsistent formatting and errors slipping through. So, we decided to flip the process — make the changes first and then generate the patches.

Workflow Overview
The workflow improves upon earlier methods by introducing a structured, tool-integrated pipeline. Each agent handles a specific task, leveraging GPT-4o (via LangChain) for repository navigation and diagnosis, then employing Qwen/QwQ-32B-Preview for context-aware edits. By breaking down the process into clear steps — diagnosis, code editing, edit application, patch generation, and evaluation — the system ensures minimal conflicts, fewer formatting errors, and maintains consistency across the repository.

Repository Cloning
Diagnoser Agent
- Problem Descriptions
- Affected File Names
- Line Numbers


Code Editor
- 1. Read the first chunk of diagnostic information.
- 2. Open the corresponding file.
- 3. Feed both the file's content and the diagnosis details (line range, description, and issue) into a grounded (No Tools given) LLM.
- 4. Suggest an edit to resolve the problem.

Evaluating the Effectiveness of Edits
Edit Applier Agent
- Generates suggested edits line-by-line to preserve the file's structure.
- Updates each line individually, respecting function boundaries and comment blocks.
- Runs a linter afterward (e.g., Black) to catch syntax or formatting errors.
- Loops back to the Suggestion Agent for a revised fix if any issues are found.

Patch Generation Agent
Submission and Evaluation
Key Improvements in Our New Strategy
Enhanced Precision
- Line-by-line replacements: By updating code on a line-by-line basis, our system minimizes disruptions to the surrounding logic and structure. This approach reduces the risk of errors, especially where indentation, comments, and boundary conditions could otherwise be inadvertently modified, so by doing this we are relevant to fixing only what's faulty.
- Intelligent formatting: Post-edit linting and formatting checks ensure that each code segment maintains structural integrity, preventing additional errors from being introduced while fixing one error.
Optimized Workflow
- Iterative loops: Each proposed fix is rigorously validated. If an issue arises, the system loops back to refine the recommendation, ensuring validated solutions before proceeding.
- Direct file edits and Git-based patches: By working directly with the repository and generating patches through Git commands, the process is streamlined and creates a precise record of changes so that our patch doesn't fail at SWE evaluations.
Accurate Diagnostics
- Structured JSON outputs: The Diagnoser Agent's clearly formatted outputs pin down the exact location of each issue, enabling precise fixes. These structured diagnostics reduce guesswork and as we have idea of where the issues are we can pin point those files and then apply systemetic grounded approach to fix those file other than blindly traversing the files in a loop.
How Our Approach Stands Out in Autonomous Software Engineering
Granular Diagnostics
- Comprehensive file issue coverage: While many competitors rely purely on LLM-generated patches, our system goes deeper — it identifies and addresses issues in a comprehensive, file-specific manner.
- Manual verification: We incorporate opportunities for human oversight to ensure that even the most subtle bugs are caught before finalizing any fix.
Repository-Centric Approach
- Tailored for repositories: Our solution is designed around repository-level diagnostics, focusing on how files interrelate and interact within a larger codebase.
- Optimized tools and workflows: With specialized modules for reading, writing, and linting code, the entire pipeline is highly precise in addressing repository requirements.
Current Operations
Issue Recreator Agent
- Error Recreation: Using the SWE dataset, this agent identifies the failed test files and re-runs them to check if the errors have been resolved after applying the fix.
- Sandboxed Validation: Since dependencies can vary across repositories, we are developing a virtual sandbox module to isolate each test environment. Within this sandbox, the repository will be installed, and the test suite will be re-run to determine whether the error persists or has been resolved.
