Skip to content
Snippets Groups Projects
Commit 8310bafb authored by Demian Frister's avatar Demian Frister
Browse files

Initial commit

parents
Branches main
No related tags found
No related merge requests found
Showing
with 1236 additions and 0 deletions
README.md 0 → 100644
# AI-Powered Test Generation Evaluation Pipeline
## Overview
This pipeline is part of the PhD research project on "Automatisiertes Testen mit KI" and implements a systematic evaluation framework for LLM-based test generation. It integrates with Defects4J to assess test quality, coverage, and effectiveness across different model configurations and testing strategies.
## Key Features
### Evaluation Components
- **Model Integration**
- Support for local and cloud-based LLMs
- Configurable model parameters
- Systematic prompt variations
- Multi-instance evaluation support
### Test Analysis
- Automated test compilation and execution
- Coverage measurement with JaCoCo
- Build error tracking and analysis
- Quality metrics collection
- Systematic error categorization
### Pipeline Architecture
- Modular component design
- Parallel evaluation execution
- Resource-aware scheduling
- Comprehensive logging system
- Reproducible evaluation workflows
## Setup
### Prerequisites
```bash
# Install Java 11 (required for Defects4J)
apt-get update && apt-get install -y msopenjdk-11
# Install Defects4J
git clone https://github.com/rjust/defects4j
cd defects4j
cpanm --installdeps .
./init.sh
# Add to PATH
export PATH=$PATH:/path/to/defects4j/framework/bin
# Additional dependencies
apt install subversion
```
### Configuration
Edit `config.py` to set:
- Model parameters
- Evaluation scope
- Resource limits
- Output paths
- Logging preferences
## Usage
### Basic Evaluation
```bash
# Run single evaluation
python main.py
# Run with specific config
python main.py --config custom_config.py
```
### Parallel Evaluation
```bash
# Generate configurations
python config_generator.py
# Run multiple instances
python multi_instance_runner.py
```
### Result Analysis
```bash
# Analyze results
python result_analyzer.py --results-dir results/
# Generate coverage report
python coverage_calculator.py --output report.json
```
## Output Structure
```
result_output/
├── detailed/
│ ├── model_outputs_evaluation_[model]_[timestamp].json
│ └── ...
├── summary/
│ ├── model_outputs_evaluation_[model]_summary_[timestamp].json
│ └── ...
└── logs/
├── evaluation_[timestamp].log
└── ...
```
## Integration
### With Main Project
- Uses shared utils from test framework
- Integrates with frontend for result display
- Supports both Java and Kotlin testing
- Compatible with containerized environment
### With Defects4J
- Automated project checkout
- Test execution environment
- Coverage analysis tools
- Result validation
## Best Practices
- Always specify Java version in configurations
- Use resource-aware parallel execution
- Implement proper error handling
- Maintain comprehensive logging
- Follow systematic evaluation protocols
## License
Part of PhD research project - All rights reserved
import os
from dotenv import load_dotenv
# Load environment variables
load_dotenv()
# OpenAI API configuration
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
if not OPENAI_API_KEY:
raise ValueError("API key not found. Please set it in the environment variables.")
# for testing purposes test just x classes. if set to -1, all classes will be tested
TEST_JUST_X_CLASSES = -1
# Constants for retries
NUM_RETRIES = 3
PAUSE_DURATION = 30 # in seconds
# Workspace configuration
WORKSPACE_DIR = '/home/coder/workspace/' # Change this for different environments!
#WORKSPACE_DIR = '/home/coder/' # Change this for different environments!
# if you want to run more than one instance of the pipeline, change this value. You need to copy defects4j with the number of this instance.
INSTANCE = 3
GENERATED_TEST_DIR = f'{WORKSPACE_DIR}synthetic-data/evaluation_pipeline/generated_tests{INSTANCE}/'
# Output configuration
# Adapt depending on the workspace you are working in
OUTPUT_DIR = f'{WORKSPACE_DIR}synthetic-data/evaluation_pipeline/result'
# Defects4J configuration
DEFECTS4J_HOME = f'{WORKSPACE_DIR}defects4j{INSTANCE}/'
#PROJECTS = ['Lang', 'Chart', 'Cli', 'Csv', 'Gson']
PROJECTS = ['Chart']
# Model configuration
DEFAULT_MODEL_CONFIG = {
'model_id': 'dev-llama-3-small',
'use_4bit': False,
'bnb_4bit_quant_type': 'nf4',
'compute_dtype': 'bfloat16',
'use_nested_quant': True,
'device_map': {"": 0},
'cache_dir': '/home/coder/data/machinelearning/models',
'use_openai_api': False,
'use_localai_api': True,
'openai_model': 'dev-llama-3-small',
'localai_base_url': 'https://aifb-bis-gpu01.aifb.kit.edu:8080/v1'
}
# Test generation prompts
START_PROMPT = """
Your task is to create a meaningful JUnit4 test class for the following Java class and method:
Before we start generating unit tests, let's take a step back and consider:
1. What are the general principles of unit testing?
2. What are the specific challenges of testing this type of method?
3. What are the most important edge cases to consider?
{code}
Requirements:
1. Use JUnit4 annotations (@Test, @Before, etc.)
2. Add the appropriate imports:
- org.junit.*;
- Any required classes from the same package.
3. Create test cases for:
- Normal/expected inputs
- Edge cases (null, empty, boundary values)
- Error conditions
4. Use descriptive test method names
5. Add a test setup if necessary
6. Add corresponding assertions
7. Handle all required exception tests
8. Include the same package declaration as the class being tested
"""
END_PROMPT = "\nTest:"
# Number of repair attempts. If set to 1, no repair will be attempted.
REPAIR_ATTEMPTS = 1
REPAIR_PROMPT = """
You are an expert software tester specializing in java apps.\n
Fix the following JUnit4 test with the given error.\n
Return only the fully corrected code file without explanations.\n
Test Code:\n
{validated_code}\n\n
Error:\n
{error}
"""
\ No newline at end of file
eval.tex 0 → 100644
\markboth{Evaluation des Testframeworks}{Evaluation des Testframeworks}
\chapter[Evaluation]{Evaluation des Testframeworks}
\label{chap:evaluation}
Entsprechend den in Kapitel \ref{cha:grundlagensoftwaretests} (Grundlagen des Softwaretestens) eingeführten Qualitätskriterien erfolgt eine systematische Evaluation des entwickelten Testframeworks nach dem in Abschnitt \ref{sec:square} beschriebenen \gls{square}-Rahmenwerk.
Ausgehend von der zweiphasigen Architektur wird die Effektivität der White-Box- und Black-Box-Komponenten bei der Fehlererkennung anhand standardisierter Metriken bewertet (\kapVerw Abschnitt \ref{subsubsec:kategorisierungSoftwarefehler}).
Im Zentrum der Evaluation steht die funktionale Eignung nach \cite{ISO25010:2023} der generierten Testfälle, da diese direkt die Fähigkeit des Systems zur Fehlererkennung und Testüberdeckung widerspiegelt.
Leistungseffizienz nach \cite{ISO25010:2023} wurde als zweite Charakteristik gewählt, weil der Ressourcenverbrauch bei \gls{AI}-basierten Systemen eine kritische Rolle spielt.
Wartbarkeit und Zuverlässigkeit des Frameworks wurden aufgrund ihrer Bedeutung für den praktischen Einsatz in Entwicklungsumgebungen als weitere Evaluationskriterien definiert.
Eine Bewertung der Qualität des Trainingsdatensatzes für das \gls{LLM} erfolgt nach den Datenqualitätsaspekten Korrektheit, Konsistenz, Aktualität und Glaubwürdigkeit gemäß \cite{ISO25024:2015}.
Zusätzlich wird analysiert, wie effektiv die implementierten Komponenten die in Abschnitt \ref{item:GL_Fehlerbezeichnungen} definierten Fehlerarten adressieren.
Nach dem in Abschnitt \ref{sec:square} eingeführten \gls{square}-Rahmenwerk basiert das Evaluationskonzept auf den Qualitätsmodellen der \cite{ISO25023:2016} und \cite{ISO25024:2015}.
Anhand der in Kapitel \ref{chap:architektur} beschriebenen Architektur und ihrer in Kapitel \ref{chap:implementierung} dargestellten Implementierung erfolgt eine systematische Evaluation.\\
Gemäß \cite{ISO25040:2011} gliedert sich der Evaluationsprozess in fünf Hauptphasen: \\
Festlegung der Evaluationsanforderungen, Spezifikation der Evaluation, Design der Evaluationsaktivitäten, Durchführung der Evaluation sowie Abschluss der Evaluation.
\section{Evaluationsanforderungen}
Basierend auf den in Abschnitt \ref{sec:ein_ziel} definierten Zielen wurden gemäß \cite{ISO25010:2023} \textit{funktionale} und \textit{nicht-funktionale} Evaluationsanforderungen bestimmt.
Zur Bewertung der funktionalen Eignung wird die Fähigkeit zur Identifikation von Fehlern nach der in Abschnitt \ref{item:GL_Fehlerbezeichnungen} eingeführten Klassifikation sowie das Erreichen der in Abschnitt \ref{subsec:TestCoverage} definierten Testüberdeckungsmetriken evaluiert.
Durch die Integration von \wbox und \blbox nach Abschnitt \ref{subsubsec:klassifikation_testobjekt} soll eine umfassende Testüberdeckung erreicht werden.
Zur Bewertung der Leistungseffizienz werden das Zeitverhalten und die Ressourcennutzung während der Testausführung analysiert.
Dabei stehen insbesondere die Ausführungszeiten der unterschiedlichen Testgenerierungsstrategien sowie der Ressourcenverbrauch der \gls{AI}-Modelle im Fokus.
Hinsichtlich der nicht-funktionalen Anforderungen wird die Wartbarkeit durch die Reproduzierbarkeit der containerisierten Testumgebung sowie die Modifizierbarkeit der generierten Testfälle sichergestellt.
Zur Bewertung der Zuverlässigkeit des Systems werden die Fehlertoleranz der Testausführung und die Wiederherstellbarkeit der Testumgebung evaluiert.
Eine systematische Erfassung und Auswertung der Evaluationsergebnisse entsprechend der Qualitätsmodelle der \gls{square}-Normen ermöglicht die Bewertung der Zielerreichung sowie der erfolgreichen Integration der White-Box- und Black-Box-Test-Komponenten zu einem kohärenten Testsystem.
\subsubsection{Implementierung der Evaluationspipeline}\label{subsubsec:implementierung_evalpipeline}
Zur Bewertung der \gls{LLM}-basierten Testgenerierung implementiert die Evaluationspipeline einen systematischen Ansatz unter Verwendung des Defects4J-Frameworks (\kapVerw Abschnitt \ref{subsubsec:grundlagenSQdefects4j}).
Basierend auf den in Abschnitt \ref{subsec:operationsphase} beschriebenen Phasen und Prozessen der White-Box-Test-Komponente wurde die Implementierung um spezifische Anforderungen der systematischen Evaluation erweitert.
Im Fokus steht dabei die automatisierte Evaluation der generierten Unit-Tests hinsichtlich ihrer syntaktischen Korrektheit, Ausführbarkeit und erreichten Codeüberdeckung.
Die Architektur der Pipeline basiert auf einem modularen Aufbau mit spezialisierten Komponenten.
Zentral koordiniert ein \texttt{PipelineOrchestrator} den gesamten Evaluationsprozess und steuert die Interaktion zwischen den einzelnen Komponenten.
Mit Hilfe des \texttt{Defects4JHandler} erfolgt die Verarbeitung der Eingabe-Codebasis aus dem Repository sowie die Extraktion relevanter Java-Klassen unter Beibehaltung der Projektstruktur und Abhängigkeiten.
Über einen \texttt{ModelHandler} wird die Interaktion mit dem \gls{LLM} realisiert, wobei sowohl lokal trainierte als auch Cloud-basierte Modelle unterstützt werden.
Ein spezialisierter \texttt{TestEvaluator} übernimmt die Kompilierung und Ausführung der generierten Tests sowie die Erfassung der Evaluationsmetriken.
Die Berechnung der Codeüberdeckung wird durch einen \texttt{CoverageCalculator} realisiert.
Der Initialisierungsprozess der Pipeline umfasst mehrere aufeinander aufbauende Schritte.
Nach der Konfiguration der Logging-Mechanismen erfolgt das Laden der Modell- und Evaluationsparameter aus der Datei \texttt{config.py}.
Die anschließende Initialisierung der Handler-Komponenten etabliert die notwendigen Verbindungen zum Defects4J-Repository und dem ausgewählten \gls{LLM}.
Bei einer Wiederaufnahme der Evaluation prüft das System auf existierende Evaluationsergebnisse in \texttt{result\_output/}, um eine effiziente Fortsetzung zu ermöglichen und redundante Berechnungen zu vermeiden.
Die eigentliche Testgenerierung und -evaluation folgt einem systematischen Prozess.
Für jede zu testende Java-Klasse extrahiert die Pipeline zunächst die relevante Java-Version aus der \texttt{pom.xml} des Projekts durch die Methode \texttt{extract\_java\_version\_from\_pom}.
Diese Information fließt in die Generierung der Unit-Tests ein, die entweder durch eine klassenbasierte oder methodenbasierte Strategie erfolgt.
Die klassenbasierte Strategie generiert Tests für die gesamte Klasse in einem Durchgang, während die methodenbasierte Strategie einzelne Methoden separat betrachtet und die resultierenden Tests durch den \texttt{TestCombiner} mit dem \ref{prompt:combine_prompt} zusammenführt oder jeweils einzeln ausführt.
Die generierten Tests durchlaufen den oben beschriebenen mehrstufigen Validierungs- und Reparaturprozess in der \texttt{validate\_and\_repair} Methode, bevor sie im Kontext des Defects4J-Projekts kompiliert und ausgeführt werden.
Die Implementierung unterstützt eine flexible Konfiguration der Evaluationsparameter über die \texttt{config}-Strukturen.
Für das \texttt{gpt-4o-mini} Modell werden sämtliche Prompt-Konfigurationen systematisch evaluiert, einschließlich verschiedener Kombinationen von Einzelmethoden-Generierung (\texttt{use\_single\_methods}), \gls{CoT} Reasoning (\texttt{use\_cot}) und dem \gls{TaS} Prinzip (\texttt{use\_tas}).
Lokale Modelle arbeiten mit einer angepassten Standard-Prompt-Konfiguration.
Die Ausführungsparameter ermöglichen die Steuerung der Reparaturversuche (\texttt{REPAIR\_ATTEMPTS}) und die parallele Durchführung mehrerer Evaluationen durch eindeutige \texttt{INSTANCE}-IDs.
Die Pipeline implementiert ein umfassendes Logging-System zur detaillierten Dokumentation des Evaluationsprozesses.
Sämtliche Zwischenergebnisse und finalen Evaluationsergebnisse werden in strukturierten JSON-Dateien gespeichert.
Diese enthalten sowohl detaillierte Metriken zur Testqualität als auch aggregierte Evaluationsergebnisse, die eine systematische Analyse der Testgenerierungsqualität ermöglichen.
Die standardisierte Struktur der Ergebnisdateien erleichtert den Vergleich verschiedener Konfigurationen und bildet die Grundlage für die kontinuierliche Verbesserung des Ansatzes.
Die implementierte Evaluationspipeline ermöglicht eine systematische und reproduzierbare Bewertung der \gls{LLM}-basierten Testgenerierung.
Durch die modulare Architektur und flexible Konfigurationsmöglichkeiten kann die Pipeline an verschiedene Evaluationsszenarien angepasst werden.
Das integrierte Logging-System gewährleistet die Nachvollziehbarkeit der Ergebnisse und unterstützt die wissenschaftliche Analyse der Testgenerierungsqualität.
Die automatisierte Konfigurationserstellung und parallele Ausführung der Evaluationspipeline erfolgt durch zwei spezialisierte Python-Module.
Die Klasse \texttt{ConfigGenerator} implementiert einen systematischen Ansatz zur Erstellung von Testkonfigurationen basierend auf Pairwise-Testing, während der \texttt{MultiInstanceRunner} die parallele Ausführung der Evaluationen koordiniert.
Die Konfigurationserstellung nutzt die \texttt{AllPairs}-Bibliothek zur systematischen Kombination der Testparameter.
Die relevanten Parameter umfassen die Anzahl der Reparaturversuche (\texttt{repair\_attempts}), die Methodengenerierungsstrategie (\texttt{use\_single\_methods}) sowie die Prompt-Konfigurationen wie \texttt{n\_shots}, \texttt{use\_cot}, \texttt{use\_tas} und \texttt{use\_source\_code}.
Für jede generierte Parameterkombination wird eine separate Konfigurationsdatei erstellt, die eine eindeutige \texttt{INSTANCE}-ID erhält.
Der \texttt{MultiInstanceRunner} implementiert eine ressourceneffiziente parallele Ausführung der Evaluationen.
Das System unterscheidet dabei zwischen lokalen Modellen und Cloud-basierten Diensten.
Für lokale Modelle wird die Anzahl parallel laufender Instanzen auf maximal zwei begrenzt, um eine Überlastung der GPU-Ressourcen zu vermeiden.
Cloud-basierte Modelle unterliegen keiner derartigen Beschränkung.
Die Ausführung erfolgt durch separate Prozesse, die über einen \texttt{ProcessManager} koordiniert werden.
Ein integriertes Logging-System protokolliert den Ausführungsstatus jeder Instanz.
Diese automatisierte Konfigurationserstellung und parallele Ausführung ermöglicht eine effiziente und systematische Evaluation verschiedener Modell- und Prompt-Kombinationen.
Die implementierte Ressourcensteuerung gewährleistet dabei eine optimale Nutzung der verfügbaren Hardware-Ressourcen.
\paragraph{Auswertung der Evaluation}
Die Auswertung der Evaluationsergebnisse erfolgt durch den\linebreak\texttt{ResultAnalyzer}, der eine systematische Analyse der generierten JSON-Dateien implementiert.
Die Implementierung nutzt das Pandas-Framework zur effizienten Datenverarbeitung und -aggregation sowie das Defects4J-Framework zur Validierung der Testergebnisse.
Der \texttt{ResultAnalyzer} extrahiert für jede Evaluationskonfiguration die relevanten Metriken zur Bewertung der Testqualität.
Die primären Metriken umfassen die Anzahl der fokalen Methoden (\texttt{Focal\_Methods}), die syntaktische Korrektheit der Tests, die Erfolgsrate der Testfälle (\texttt{Test\_Cases\_Passing}) und auftretende Build-Fehler (\texttt{Test\_Cases\_Build\_Error}).
Zusätzlich wird die erreichte Zeilen- und Bedingungsüberdeckung ermittelt, die einen objektiven Vergleich der Testqualität ermöglicht.
Ein integrierter\linebreak\texttt{CoverageCalculator} berechnet die tatsächliche Codeüberdeckung der generierten Tests durch automatisierte Ausführung im Defects4J-Framework.
Die Analyseergebnisse werden in einer strukturierten tabellarischen Form aufbereitet.
Neben den quantitativen Metriken zur Testqualität erfasst die Analyse auch qualitative Aspekte wie die Laufzeit der Evaluationen und die verwendeten Prompt-Konfigurationen.
Der \texttt{ResultAnalyzer} berechnet zudem aggregierte Metriken wie die Gesamterfolgsrate der Testgenerierung und die durchschnittliche Testüberdeckung für verschiedene Modellkonfigurationen.
Die implementierte Auswertungslogik unterstützt sowohl die detaillierte Analyse einzelner Evaluationsläufe als auch den systematischen Vergleich verschiedener Modell- und Prompt-Konfigurationen.
Durch die standardisierte Aufbereitung und Speicherung der Ergebnisse wird eine wissenschaftlich fundierte Analyse der \gls{LLM}-basierten Testgenerierung ermöglicht.
Die gewonnenen Erkenntnisse bilden die Grundlage für die kontinuierliche Verbesserung des Ansatzes und die Anpassung der Testgenerierungsstrategien.
\ No newline at end of file
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
import logging
import os
import subprocess
import shutil
from utils.java_utils import get_package_name
class Defects4JHandler:
def __init__(self, defects4j_home):
self.defects4j_home = defects4j_home
self._setup_defects4j()
def _setup_defects4j(self):
logging.info("Setting up Defects4J...")
try:
if not shutil.which("defects4j"):
defects4j_bin = os.path.join(self.defects4j_home, 'framework', 'bin')
os.environ["PATH"] += os.pathsep + defects4j_bin
if not shutil.which("defects4j"):
raise EnvironmentError("Defects4J not found.")
result = subprocess.run(['defects4j', 'pids'], capture_output=True, text=True, check=True)
logging.info("Defects4J is installed and accessible.")
#logging.info(f"Available projects:\n{result.stdout.strip()}")
except Exception as e:
logging.error(f"Error setting up Defects4J: {e}")
raise
# remove all original test files from the projects
def remove_test_files(self, projects):
project_versions = {project: 1 for project in projects} # Start with version 1 for all projects
working_dir_base = os.path.join(self.defects4j_home, 'working_dir')
for project in projects:
version = project_versions[project]
checkout_dir = os.path.join(working_dir_base, f'{project}_{version}')
logging.info(f"Removing test files from {checkout_dir}")
if not os.path.exists(checkout_dir):
logging.warning(f"Test path does not exist: {checkout_dir}")
continue
removed_count = 0
failed_count = 0
try:
for root, _, files in os.walk(checkout_dir):
for file in files:
if file.endswith('Test.java') or file.endswith('Tests.java'):
file_path = os.path.join(root, file)
try:
os.remove(file_path)
removed_count += 1
except OSError as e:
logging.warning(f"Failed to remove test file {file_path}: {e}")
failed_count += 1
logging.info(f"Project {project}: Removed {removed_count} test files, {failed_count} failures")
except Exception as e:
logging.error(f"Error while processing project {project}: {e}")
raise
def read_code_samples(self, projects):
file_contents = []
project_versions = {project: 1 for project in projects} # Start with version 1 for all projects
working_dir_base = os.path.join(self.defects4j_home, 'working_dir')
os.makedirs(working_dir_base, exist_ok=True)
for project in projects:
version = project_versions[project]
checkout_dir = os.path.join(working_dir_base, f'{project}_{version}')
# Checkout project if needed
if not os.path.exists(checkout_dir):
try:
subprocess.run(
['defects4j', 'checkout', '-p', project, '-v', f'{version}b', '-w', checkout_dir],
check=True
)
except subprocess.CalledProcessError as e:
logging.error(f"Failed to checkout {project}: {e}")
continue
# Get source directory
src_dir = self._get_source_directory(checkout_dir)
if not src_dir:
continue
# Collect Java files
for root, _, files in os.walk(src_dir):
for file in files:
if file.endswith('.java'):
file_path = os.path.join(root, file)
try:
with open(file_path, 'r') as f:
content = f.read()
package = get_package_name(content)
class_name = file[:-5] # Remove .java
full_class_name = f"{package}.{class_name}" if package else class_name
file_contents.append((project, full_class_name, content, checkout_dir))
except Exception as e:
logging.error(f"Error reading {file_path}: {e}")
logging.info(f"Collected {len(file_contents)} Java classes")
return file_contents
def _get_source_directory(self, checkout_dir):
try:
result = subprocess.run(
['defects4j', 'export', '-p', 'dir.src.classes'],
cwd=checkout_dir,
capture_output=True,
text=True
)
if result.returncode == 0:
src_dir = os.path.join(checkout_dir, result.stdout.strip())
if os.path.exists(src_dir):
return src_dir
# Fallback paths
possible_dirs = [
os.path.join(checkout_dir, 'src', 'main', 'java'),
os.path.join(checkout_dir, 'src', 'java'),
os.path.join(checkout_dir, 'src')
]
for dir_path in possible_dirs:
if os.path.exists(dir_path):
return dir_path
raise FileNotFoundError("No valid source directory found")
except Exception as e:
logging.error(f"Error finding source directory: {e}")
raise
import logging
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, GenerationConfig
from openai import OpenAI
from utils.java_utils import extract_imports_from_source
import re
import sys
import time
from config import NUM_RETRIES, PAUSE_DURATION, OPENAI_API_KEY, WORKSPACE_DIR
sys.path.append(f'{WORKSPACE_DIR}testframeworkwhiteboxmodul')
from LLMTests.unitTestGenerator.test_code_processor import TestCodeProcessor
class ModelHandler:
def __init__(self, config):
self.config = config
if config['use_openai_api']:
self.client = OpenAI(api_key=OPENAI_API_KEY)
if config['use_localai_api']:
self.client = OpenAI(api_key='test', base_url=config['localai_base_url'])
if not config['use_openai_api'] and not config['use_localai_api']:
self.client = None
self.processor = TestCodeProcessor() # Initialize the test code processor
if not config['use_openai_api'] and not config['use_localai_api']:
self.model, self.tokenizer = self._load_model_and_tokenizer()
else:
self.model, self.tokenizer = None, None
def _load_model_and_tokenizer(self):
try:
logging.info("Loading tokenizer...")
cache_dir = self.config.get('cache_dir', './model_cache')
tokenizer = AutoTokenizer.from_pretrained(
self.config['model_id'],
trust_remote_code=True,
cache_dir=cache_dir
)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
logging.info("Loading model...")
device_map = self.config.get('device_map', {"": 0})
bnb_config = BitsAndBytesConfig(
load_in_4bit=self.config['use_4bit'],
bnb_4bit_quant_type=self.config['bnb_4bit_quant_type'],
bnb_4bit_compute_dtype=getattr(torch, self.config['compute_dtype']),
bnb_4bit_use_double_quant=self.config['use_nested_quant'],
)
model = AutoModelForCausalLM.from_pretrained(
self.config['model_id'],
quantization_config=bnb_config,
device_map=device_map,
cache_dir=cache_dir,
trust_remote_code=True
)
model.config.use_cache = False
return model, tokenizer
except Exception as e:
logging.error(f"Error loading model and tokenizer: {e}")
raise
def generate_tests(self, prompt):
"""Generate test code using either OpenAI API or local model."""
try:
if self.config['use_openai_api'] or self.config['use_localai_api']:
return self._generate_with_openai(prompt)
else:
return self._generate_from_local_model(prompt)
except Exception as e:
logging.error(f"Error in generate_tests: {e}")
return None
def _generate_with_openai(self, prompt, temperature=0.3):
retry_count = 0
while True:
try:
response = self.client.chat.completions.create(
model=self.config.get('openai_model', 'gpt-4o-mini'),
messages=[
{"role": "system", "content": "You are an experienced software tester specializing in Android Applications."},
{"role": "user", "content": prompt}
],
temperature=temperature
)
return response.choices[0].message.content
except Exception as e:
logging.error(f"Error during OpenAI API call: {e}")
retry_count += 1
if retry_count >= NUM_RETRIES:
raise
logging.info(f"Retrying OpenAI API call... ({retry_count}/{NUM_RETRIES})")
time.sleep(5)
def _generate_from_local_model(self, prompt):
try:
encoded_input = self.tokenizer(
prompt,
return_tensors='pt',
truncation=True
).to(self.model.device)
generation_config = GenerationConfig(
max_new_tokens=1000,
temperature=0.3,
top_p=0.95,
do_sample=True,
pad_token_id=self.tokenizer.pad_token_id,
eos_token_id=self.tokenizer.eos_token_id,
num_return_sequences=1,
)
output_sequences = self.model.generate(
input_ids=encoded_input['input_ids'],
attention_mask=encoded_input['attention_mask'],
generation_config=generation_config,
)
return self.tokenizer.decode(output_sequences[0], skip_special_tokens=True)
except Exception as e:
logging.error(f"Error during text generation: {e}")
raise
def postprocess(self, output_text, source_code):
"""Extract, validate, and process the generated test code."""
try:
# Use TestCodeProcessor to extract and validate test code
test_code = self.processor.extract_test_code(output_text)
if not test_code:
logging.warning("No valid test code found in the output.")
return ""
validated_code = self.processor.validate_test_code(test_code, 'java')
if not validated_code:
logging.warning("Extracted code is invalid or missing required elements.")
return ""
return validated_code
except Exception as e:
logging.error(f"Error during postprocessing: {e}")
return ""
def repair_test(self, prompt):
"""Generate repaired test code using the repair prompt."""
try:
output_text = self.generate_tests(prompt)
repaired_code = self.postprocess(output_text, source_code=None)
return repaired_code
except Exception as e:
logging.error(f"Error in repair_test: {e}")
return None
import logging
import os
import subprocess
import json
import re
from utils.java_utils import extract_test_class_name, extract_test_methods, get_package_name
from .model_handler import ModelHandler
from config import DEFAULT_MODEL_CONFIG, REPAIR_PROMPT, WORKSPACE_DIR, REPAIR_ATTEMPTS, GENERATED_TEST_DIR
class TestEvaluator:
def __init__(self):
self.generated_tests_dir = GENERATED_TEST_DIR
os.makedirs(self.generated_tests_dir, exist_ok=True)
logging.info("TestEvaluator initialized with generated_tests_dir: %s", self.generated_tests_dir)
self.model_handler = ModelHandler(config=DEFAULT_MODEL_CONFIG) # Initialize the ModelHandler
def evaluate(self, code_under_test, unit_test_code, project_name, class_name, defects4j_home, checkout_dir):
evaluation_results = {
"syntactic_correctness": False,
"runtime_correctness": False,
"total_tests": 0,
"passed_tests": 0,
"failed_tests": 0,
"passing_rate": 0.0,
"coverage": 0.0
}
logging.info(f"Evaluating unit test for {class_name} in {checkout_dir}")
try:
logging.info("Starting evaluation process for project: %s, class: %s", project_name, class_name)
# Get project build system
logging.debug("Detecting build system for project %s", project_name)
project_info_cmd = ['defects4j', 'info', '-p', project_name]
project_info_result = subprocess.run(project_info_cmd, cwd=checkout_dir, capture_output=True, text=True)
build_system = "ant" if "ant" in project_info_result.stdout.lower() else "maven"
logging.info("Detected build system: %s", build_system)
# Get project paths
logging.debug("Retrieving project source and test directories")
src_dir_rel = subprocess.check_output(
['defects4j', 'export', '-p', 'dir.src.classes'],
cwd=checkout_dir, text=True
).strip()
test_dir_rel = subprocess.check_output(
['defects4j', 'export', '-p', 'dir.src.tests'],
cwd=checkout_dir, text=True
).strip()
logging.info("Project directories - src: %s, test: %s", src_dir_rel, test_dir_rel)
# Construct full test directory path
test_dir = os.path.join(checkout_dir, test_dir_rel)
# Get classpaths
logging.debug("Retrieving project classpaths")
compile_cp = subprocess.check_output(
['defects4j', 'export', '-p', 'cp.compile'],
cwd=checkout_dir, text=True
).strip()
test_cp = subprocess.check_output(
['defects4j', 'export', '-p', 'cp.test'],
cwd=checkout_dir, text=True
).strip()
logging.info("Project classpaths - compile: %s, test: %s", compile_cp, test_cp)
# Save test file
package_name = get_package_name(unit_test_code)
test_class_name = extract_test_class_name(unit_test_code)
test_package_dir = os.path.join(self.generated_tests_dir, *package_name.split('.')) if package_name else self.generated_tests_dir
os.makedirs(test_package_dir, exist_ok=True)
test_file_path = os.path.join(test_package_dir, f"{test_class_name}.java")
logging.info("Saving test file with package: %s, class: %s", package_name, test_class_name)
with open(test_file_path, 'w') as f:
f.write(unit_test_code)
# Copy test to project's test directory
project_test_file = self._copy_test_to_project(
unit_test_code,
package_name,
test_class_name,
test_dir
)
if not project_test_file:
logging.error("Failed to copy test file to project directory")
return evaluation_results
# Compile test
logging.info("Attempting to compile test file: %s", test_file_path)
# Compilation loop with repair attempts
for attempt in range(REPAIR_ATTEMPTS):
logging.info(f"Compilation attempt {attempt + 1}/{REPAIR_ATTEMPTS}")
compilation_success, compile_result = self._compile_test(
test_file_path, build_system, checkout_dir, compile_cp, test_cp
)
if compilation_success:
break
else:
logging.error(f"Compilation failed on attempt {attempt + 1}. Attempting to repair...")
# Repair the test file
if REPAIR_ATTEMPTS > 1:
unit_test_code = self._repair_test(
test_file_path, checkout_dir, defects4j_home, compile_result.stderr, unit_test_code, attempt + 1
)
if not unit_test_code:
logging.error(f"Test repair failed on attempt {attempt + 1}")
break # Exit if repair failed
# Overwrite the test file with the repaired code
for file_path in [test_file_path, project_test_file]:
with open(file_path, 'w') as f:
f.write(unit_test_code)
if not os.path.exists(file_path):
logging.error(f"Failed to write repaired test to {file_path}")
return evaluation_results
if not compilation_success:
logging.error(f"Test failed after {attempt + 1} repair attempts. Cleaning up and skipping test.")
logging.error("Compilation output: %s", compile_result.stderr)
self._cleanup_test_files(test_file_path, project_test_file)
evaluation_results.update({
"syntactic_correctness": False,
"runtime_correctness": False,
"skipped": True,
"skip_reason": f"Failed to compile after {attempt + 1} repair attempts"
})
return evaluation_results
logging.info("Compilation %s", "successful" if compilation_success else "failed")
if compilation_success:
# Run tests
test_methods = extract_test_methods(unit_test_code)
test_results = self._run_tests(
test_methods, package_name, test_class_name,
checkout_dir, defects4j_home
)
evaluation_results.update({
"total_tests": len(test_methods),
"passed_tests": test_results["passed"],
"failed_tests": test_results["failed"],
"passing_rate": (test_results["passed"] / len(test_methods) * 100) if test_methods else 0.0,
"runtime_correctness": test_results["failed"] == 0 and len(test_methods) > 0,
"skipped": False,
"compilation_attempts": attempt + 1,
"repair_attempts": REPAIR_ATTEMPTS,
"syntactic_correctness": True,
"runtime_correctness": True,
})
# Collect coverage data if tests passed
if evaluation_results["runtime_correctness"]:
# Inside the evaluate method, within the 'if evaluation_results["runtime_correctness"]:' block:
logging.info("Collecting coverage data...")
coverage_cmd = ['defects4j', 'coverage'] # Default to changed classes
# Option to instrument all classes (you can make this configurable)
instrument_all = True # Or get this from a config setting
if instrument_all:
coverage_cmd.append('-i')
# Create a file listing all source classes (similar to the previous suggestion)
src_dir_rel = subprocess.check_output(
['defects4j', 'export', '-p', 'dir.src.classes'],
cwd=checkout_dir, text=True
).strip()
src_dir = os.path.join(checkout_dir, src_dir_rel)
all_classes_file = os.path.join(self.generated_tests_dir, "all_classes.txt")
with open(all_classes_file, 'w') as f:
for root, _, files in os.walk(src_dir):
for file in files:
if file.endswith(".java"):
package_path = os.path.relpath(root, src_dir).replace(os.sep, '.')
if package_path and package_path != '.':
f.write(f"{package_path}.{file[:-5]}\n")
else:
f.write(f"{file[:-5]}\n")
coverage_cmd.append(all_classes_file)
coverage_result = subprocess.run(coverage_cmd, cwd=checkout_dir, capture_output=True, text=True)
if coverage_result.returncode == 0:
lines_total_match = re.search(r"Lines total: (\d+)", coverage_result.stdout)
lines_covered_match = re.search(r"Lines covered: (\d+)", coverage_result.stdout)
if lines_total_match and lines_covered_match:
total = int(lines_total_match.group(1))
covered = int(lines_covered_match.group(1))
evaluation_results["coverage"] = (covered / total * 100) if total > 0 else 0.0
except Exception as e:
logging.exception("Unexpected error during evaluation")
self._cleanup_test_files(test_file_path, project_test_file)
raise
finally:
logging.info(f"Evaluation Results: {json.dumps(evaluation_results, indent=2)}")
return evaluation_results
def _repair_test(self, test_file_path, checkout_dir, defects4j_home, error_log, unit_test_code, attempt):
logging.info(f"Attempting test repair (attempt {attempt}/3)")
# Use ModelHandler to repair the test file
repair_prompt = REPAIR_PROMPT.format(validated_code=unit_test_code, error=error_log)
repaired_code = self.model_handler.repair_test(repair_prompt)
if not repaired_code:
logging.error("Model returned no repaired code")
return None
return repaired_code
def _compile_test(self, test_file_path, build_system, checkout_dir, compile_cp, test_cp):
compilation_success = False
logging.info("Compiling test with %s build system", build_system)
logging.info("Compilation command parameters - file: %s, checkout_dir: %s", test_file_path, checkout_dir)
compile_cmd = [
'defects4j', 'compile',
'-w', f"{checkout_dir}",
test_file_path
]
result = subprocess.run(compile_cmd, cwd=checkout_dir, capture_output=True, text=True)
logging.info("Compilation return code: %d", result.returncode)
if result.returncode != 0:
logging.error("Compilation stderr: %s", result.stderr)
else:
compilation_success = True
return compilation_success, result
def _run_tests(self, test_methods, package_name, test_class_name, checkout_dir, defects4j_home):
logging.info("Running %d test methods for %s.%s", len(test_methods), package_name, test_class_name)
results = {
"passed": 0,
"failed": 0,
"failed_tests": []
}
for test_method in test_methods:
test_identifier = f"{package_name}.{test_class_name}::{test_method}"
logging.info("Executing test: %s", test_identifier)
cmd = ['defects4j', 'test', '-t', test_identifier]
env = os.environ.copy()
env['DEFECTS4J_HOME'] = defects4j_home
try:
result = subprocess.run(cmd, cwd=checkout_dir, capture_output=True, text=True, env=env)
if result.returncode == 0 and "Failing tests: 0" in result.stdout:
logging.info("Test passed: %s", test_identifier)
results["passed"] += 1
else:
logging.error("Test failed: %s", test_identifier)
logging.error("Test output: %s", result.stdout)
results["failed"] += 1
results["failed_tests"].append(test_identifier)
except Exception as e:
logging.exception("Error executing test: %s", test_identifier)
results["failed"] += 1
results["failed_tests"].append(test_identifier)
logging.info("Test execution complete. Results: %s", json.dumps(results, indent=2))
return results
def _copy_test_to_project(self, unit_test_code, package_name, test_class_name, test_dir):
logging.info("Copying test file to project directory - package: %s, class: %s", package_name, test_class_name)
"""
Copies the generated test to the project's test directory structure.
"""
try:
# Create package directory structure in test dir
if package_name:
package_path = os.path.join(test_dir, *package_name.split('.'))
else:
package_path = test_dir
os.makedirs(package_path, exist_ok=True)
# Write test file to the correct package directory
test_file_path = os.path.join(package_path, f"{test_class_name}.java")
with open(test_file_path, 'w') as f:
f.write(unit_test_code)
logging.info(f"Test file copied to project test directory: {test_file_path}")
return test_file_path
except Exception as e:
logging.error(f"Error copying test to project directory: {e}")
return None
def _cleanup_test_files(self, *file_paths):
"""Remove test files from both locations."""
for file_path in file_paths:
try:
if file_path and os.path.exists(file_path):
os.remove(file_path)
logging.info(f"Cleaned up test file: {file_path}")
except Exception as e:
logging.error(f"Error cleaning up file {file_path}: {e}")
def calculate_overall_coverage(self, checkout_dir):
logging.info("Calculating overall coverage...")
coverage_results = {
"lines_total": 0,
"lines_covered": 0,
"line_coverage": 0.0,
"conditions_total": 0,
"conditions_covered": 0,
"condition_coverage": 0.0
}
try:
src_dir_rel = subprocess.check_output(
['defects4j', 'export', '-p', 'dir.src.classes'],
cwd=checkout_dir, text=True
).strip()
src_dir = os.path.join(checkout_dir, src_dir_rel)
all_classes_file = os.path.join(self.generated_tests_dir, "all_classes_final.txt")
os.makedirs(self.generated_tests_dir, exist_ok=True)
with open(all_classes_file, 'w') as f:
for root, _, files in os.walk(src_dir):
for file in files:
if file.endswith(".java"):
package_path = os.path.relpath(root, src_dir).replace(os.sep, '.')
if package_path and package_path != '.':
f.write(f"{package_path}.{file[:-5]}\n")
else:
f.write(f"{file[:-5]}\n")
coverage_cmd = ['defects4j', 'coverage', '-i', all_classes_file]
coverage_result = subprocess.run(coverage_cmd, cwd=checkout_dir, capture_output=True, text=True)
if coverage_result.returncode == 0:
lines_total_match = re.search(r"Lines total: (\d+)", coverage_result.stdout)
lines_covered_match = re.search(r"Lines covered: (\d+)", coverage_result.stdout)
conditions_total_match = re.search(r"Conditions total: (\d+)", coverage_result.stdout)
conditions_covered_match = re.search(r"Conditions covered: (\d+)", coverage_result.stdout)
if lines_total_match and lines_covered_match:
coverage_results["lines_total"] = int(lines_total_match.group(1))
coverage_results["lines_covered"] = int(lines_covered_match.group(1))
coverage_results["line_coverage"] = (coverage_results["lines_covered"] / coverage_results["lines_total"] * 100) if coverage_results["lines_total"] > 0 else 0.0
if conditions_total_match and conditions_covered_match:
coverage_results["conditions_total"] = int(conditions_total_match.group(1))
coverage_results["conditions_covered"] = int(conditions_covered_match.group(1))
coverage_results["condition_coverage"] = (coverage_results["conditions_covered"] / coverage_results["conditions_total"] * 100) if coverage_results["conditions_total"] > 0 else 0.0
else:
logging.error(f"Final coverage command failed: {coverage_result.stderr}")
except Exception as e:
logging.error(f"Error calculating final coverage: {e}", exc_info=True)
finally:
return coverage_results
main.py 0 → 100644
import logging
import sys
import json
import os
from datetime import datetime
from config import (
DEFAULT_MODEL_CONFIG,
DEFECTS4J_HOME,
PROJECTS,
START_PROMPT,
END_PROMPT,
REPAIR_PROMPT,
OUTPUT_DIR,
TEST_JUST_X_CLASSES,
WORKSPACE_DIR,
REPAIR_ATTEMPTS,
NUM_RETRIES,
PAUSE_DURATION
)
from handlers.model_handler import ModelHandler
from handlers.defects4j_handler import Defects4JHandler
from handlers.test_evaluator import TestEvaluator
def setup_logging():
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s [%(levelname)s] %(message)s',
handlers=[logging.StreamHandler(sys.stdout)]
)
def generate_output_filename(base_name, config, timestamp, is_summary=False):
os.makedirs(OUTPUT_DIR, exist_ok=True)
model_name = config.get('openai_model', 'unknown') if config.get('use_openai_api') else \
config.get('model_id', 'unknown').split('/')[-1]
api_suffix = "openai" if config.get('use_openai_api') else "local"
suffix = "summary" if is_summary else "detailed"
return os.path.join(OUTPUT_DIR, f"{base_name}_{model_name}_{api_suffix}_{suffix}_{timestamp}.json")
def save_outputs(data, filename):
with open(filename, "w") as f:
json.dump(data, f, indent=4)
def generate_summary_output(model_outputs, final_coverage):
summary = {
"configuration": {
"model_config": DEFAULT_MODEL_CONFIG,
"projects": PROJECTS,
"test_limit": TEST_JUST_X_CLASSES,
"workspace_dir": WORKSPACE_DIR,
"output_dir": OUTPUT_DIR,
"defects4j_home": DEFECTS4J_HOME,
"repair_config": {
"repair_attempts": REPAIR_ATTEMPTS,
"num_retries": NUM_RETRIES,
"pause_duration": PAUSE_DURATION
},
"prompts": {
"start_prompt": START_PROMPT,
"end_prompt": END_PROMPT,
"repair_prompt": REPAIR_PROMPT
}
},
"overall_final_coverage": final_coverage,
"project_summaries": []
}
for project_data in model_outputs:
project_summary = {
"project_name": project_data["project_name"],
"class_summaries": []
}
for class_output in project_data.get("content_outputs", []):
class_summary = {
"class_name": class_output["class_name"],
"evaluation_summary": class_output.get("evaluation", {})
}
project_summary["class_summaries"].append(class_summary)
summary["project_summaries"].append(project_summary)
return summary
def main():
"""
Main function to execute the evaluation pipeline.
"""
setup_logging()
logging.info("Starting the evaluation pipeline.")
try:
# Create a single timestamp for this run
run_timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
# Initialize handlers
model_handler = ModelHandler(DEFAULT_MODEL_CONFIG)
defects4j_handler = Defects4JHandler(DEFECTS4J_HOME)
evaluator = TestEvaluator()
# Generate base filename for this run
base_filename = "model_outputs_evaluation" # Base name for both detailed and summary outputs
common_filename = generate_output_filename(base_filename, DEFAULT_MODEL_CONFIG, run_timestamp)
# Read code samples
file_contents = defects4j_handler.read_code_samples(PROJECTS)
#remove original tests
defects4j_handler.remove_test_files(PROJECTS)
# Initialize model outputs list
model_outputs = []
test_just_x_classes=TEST_JUST_X_CLASSES
# Process each code sample
for project_name, class_name, content, checkout_dir in file_contents:
logging.info(f"Processing class '{class_name}' from project '{project_name}'.")
content_outputs = []
try:
# Generate prompt and test
full_prompt = START_PROMPT.format(code=content) + END_PROMPT
output = model_handler.generate_tests(full_prompt)
unit_test_code = model_handler.postprocess(output, content)
if not unit_test_code:
logging.warning(f"No valid unit test code generated for class '{class_name}'.")
continue
# Evaluate the test
evaluation_results = evaluator.evaluate(
code_under_test=content,
unit_test_code=unit_test_code,
project_name=project_name,
class_name=class_name,
defects4j_home=DEFECTS4J_HOME,
checkout_dir=checkout_dir
)
content_outputs.append({
"project_name": project_name,
"class_name": class_name,
"evaluation": evaluation_results
})
logging.info(f"Successfully processed class '{class_name}'.")
except Exception as e:
logging.error(f"Error processing class '{class_name}': {e}", exc_info=True)
if content_outputs:
model_outputs.append({
"project_name": project_name,
"class_name": class_name,
"content_outputs": content_outputs
})
# Save intermediate detailed results
detailed_filename = generate_output_filename(base_filename, DEFAULT_MODEL_CONFIG, run_timestamp)
save_outputs(model_outputs, detailed_filename)
logging.info(f"Saved intermediate detailed results after processing '{class_name}'")
if test_just_x_classes>0:
logging.info(f"Test just x classes: {test_just_x_classes}.")
test_just_x_classes-=1
if test_just_x_classes==0:
break
logging.info("Processing each code sample completed. Calculating final coverage.")
final_coverage_results = {}
# Calculate final coverage if any projects were processed
if model_outputs:
project_names = list(set(item['project_name'] for item in model_outputs))
if len(project_names) == 1:
final_project_name = project_names[0]
checkout_dir = None
for project, cls, _, chk_dir in file_contents:
if project == final_project_name:
checkout_dir = chk_dir
break
if checkout_dir:
logging.info(f"Calculating final coverage for project: {final_project_name} in {checkout_dir}")
final_coverage_results = evaluator.calculate_overall_coverage(checkout_dir)
logging.info(f"Final Coverage Results: {json.dumps(final_coverage_results, indent=2)}")
else:
logging.warning("Checkout directory not found for the processed project. Skipping final coverage calculation.")
elif len(project_names) > 1:
logging.warning("Multiple projects processed. Final coverage calculation for multiple projects is not implemented in this version.")
else:
logging.warning("No projects were processed. Skipping final coverage calculation.")
else:
logging.warning("No projects were processed. Skipping final coverage calculation.")
# Generate and save summary output
summary_output = generate_summary_output(model_outputs, final_coverage_results)
summary_filename = generate_output_filename(base_filename, DEFAULT_MODEL_CONFIG, run_timestamp, is_summary=True)
save_outputs(summary_output, summary_filename)
logging.info(f"Saved final summary output to {summary_filename}")
# Save the final detailed output with the coverage results
if model_outputs:
model_outputs[-1]['overall_final_coverage'] = final_coverage_results # Add coverage to the last project's data
detailed_filename = generate_output_filename(base_filename, DEFAULT_MODEL_CONFIG, run_timestamp)
save_outputs(model_outputs, detailed_filename)
logging.info(f"Saved final detailed results with coverage to {detailed_filename}")
logging.info("Evaluation pipeline completed successfully.")
except Exception as e:
logging.critical(f"An unexpected error occurred: {e}", exc_info=True)
# Save whatever results we have before exiting
if model_outputs:
detailed_filename = generate_output_filename(base_filename, DEFAULT_MODEL_CONFIG, run_timestamp)
save_outputs(model_outputs, detailed_filename)
logging.info(f"Saved partial detailed results before exit to {detailed_filename}")
sys.exit(1)
if __name__ == "__main__":
main()
\ No newline at end of file
[
{
"project_name": "Chart",
"class_name": "org.jfree.chart.Drawable",
"content_outputs": [
{
"project_name": "Chart",
"class_name": "org.jfree.chart.Drawable",
"evaluation": {
"syntactic_correctness": false,
"runtime_correctness": false,
"total_tests": 0,
"passed_tests": 0,
"failed_tests": 0,
"passing_rate": 0.0,
"coverage": 0.0,
"skipped": true,
"skip_reason": "Failed to compile after 3 repair attempts"
}
}
]
}
]
\ No newline at end of file
[
{
"project_name": "Chart",
"class_name": "org.jfree.chart.Drawable",
"content_outputs": [
{
"project_name": "Chart",
"class_name": "org.jfree.chart.Drawable",
"evaluation": {
"syntactic_correctness": false,
"runtime_correctness": false,
"total_tests": 0,
"passed_tests": 0,
"failed_tests": 0,
"passing_rate": 0.0,
"coverage": 0.0,
"skipped": true,
"skip_reason": "Failed to compile after 1 repair attempts"
}
}
]
},
{
"project_name": "Chart",
"class_name": "org.jfree.chart.RenderingSource",
"content_outputs": [
{
"project_name": "Chart",
"class_name": "org.jfree.chart.RenderingSource",
"evaluation": {
"syntactic_correctness": false,
"runtime_correctness": false,
"total_tests": 0,
"passed_tests": 0,
"failed_tests": 0,
"passing_rate": 0.0,
"coverage": 0.0
}
}
]
}
]
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment