Mermaid Diagram Processor - Requirements Document (Current State)

Project Goal

To create a Python script that processes an HTML file, identifies all Mermaid diagram code blocks, and attempts to convert each into a PNG image using a specified external command-line tool. If the initial conversion fails due to errors in the Mermaid code, the script will leverage a Generative AI (currently Google Gemini) to attempt to fix the code. The script will iteratively try to render the AI-corrected code up to a user-specified number of retries. Successfully generated PNGs will replace the original Mermaid your browser supports them or if you view it in an environment that processes Mermaid (like some Markdown editors or dedicated tools). ```html Mermaid Diagram Processor - Requirements Specification

Mermaid Diagram Processor - Requirements Specification

Version: 1.0 (Reflecting current script capabilities as of last update)

Date: July 15, 2024

1. Project Goal

To create a Python script that automates the processing of HTML files containing Mermaid diagram code blocks. The script will:

Identify all Mermaid diagrams within an input HTML file.
Attempt to convert each diagram into a PNG image using a user-specified external command-line tool.
If PNG generation is successful, replace the original Mermaid code block in the HTML with an tag pointing to the generated PNG.
If PNG generation fails due to errors in the Mermaid code:
- The script will capture the complete error output from the external tool.
- It will then send the failing Mermaid code, the captured error message, and a history of previous AI attempts for that diagram to a designated Generative AI (currently Google Gemini) to request a fix.
- This AI-assisted correction and rendering process will be attempted for a user-specified maximum number of retries.
The entire process, including successes, failures, AI interactions, and detailed error messages, must be logged to both a file and the console with configurable verbosity.
The script must be runnable from the command line with various options for input/output paths, executable configuration, and operational parameters.

2. Overall Process Flow


graph TD
    A[Start] --> B{Parse Command-Line Arguments};
    B --> C{Read Input HTML File};
    C --> D{Parse HTML (BeautifulSoup)};
    D --> E{Find All 'div.mermaid' Blocks};
    E -- No Diagrams Found --> F[Save Original HTML to Output];
    E -- Diagrams Found --> G{Loop Through Each Diagram};
    F --> Z[End];
    
    G --> H[Extract Mermaid Code & Determine Name];
    H --> I[Attempt 1: Render PNG with Executable];
    I -- Success --> J[Replace 'div.mermaid' with '' Tag];
    I -- Failure (Exec Error or PNG Not Found) --> K{AI Retries Enabled? (ai_retries > 0 AND API Key Set?)};
    
    K -- No --> L[Log Failure, Keep Original 'div.mermaid'];
    L --> M{More Diagrams?};
    J --> M;

    K -- Yes --> N[Start AI Retry Loop (Max: ai_retries)];
    N --> O{Current AI Attempt < Max Retries?};
    O -- Yes --> P[Prepare Prompt for AI (Code, Error, History)];
    P --> Q[Call Gemini AI API];
    Q --> R{AI Suggests New, Different Code?};
    R -- Yes --> S[Update .mmd with AI Code, Increment AI Attempt];
    S --> I;  // Re-attempt rendering with new code
    R -- No (No Suggestion / Same Code / AI Error) --> T[Log AI Failure, Revert .mmd to Original];
    T --> L; // Mark as failed for this diagram
    O -- No (Max Retries Reached) --> T;

    M -- Yes --> G;
    M -- No --> U[Save Modified HTML to Output File];
    U --> V[Log Processing Summary];
    V --> Z;

3. Detailed Step-by-Step Requirements

3.1. Setup & Configuration

The script must accept the following command-line arguments:

-i / --input-html (Required): Path to the input HTML file.
-o / --output-html (Required): Path to save the modified HTML file.
-d / --image-dir (Required): Directory to store generated PNG images and intermediate .mmd files. The script will create this directory if it doesn't exist.
--executable (Required): Full path to the external command-line executable (e.g., mmdc , genmermaid.sh ) that converts a .mmd file to a .png file.
--executable-args (Optional, List): Additional arguments to pass to the external executable. Supports placeholders:
- {input_file} : Absolute path to the temporary .mmd file.
- {output_dir} : Absolute path to the image output directory.
- {base_name} : The generated base name for the diagram (e.g., "my_diagram_1"). This allows forming output filenames like {output_dir}/{base_name}.png .
If not provided, the script assumes the executable takes the input .mmd file path as its sole primary argument and outputs the PNG in the same directory as the .mmd file with the same base name.
--log-file (Required): Path for the detailed log file.
--log-level (Optional, Choices: INFO, DEBUG, Default: INFO): Sets the logging verbosity for console output. The log file will always capture code blocks in the HTML with appropriate tags. The entire process, including AI interactions, must be thoroughly logged, and the script must be configurable via command-line options.

Core Functionality Overview

Parse an input HTML file to find elements.
For each diagram:
- Extract the Mermaid code and a descriptive name.
- Save the code to a temporary .mmd file.
- Attempt to generate a PNG using a user-specified external executable.
- If successful, replace the with an tag in the HTML.
- If unsuccessful:
  - Capture the full error output (stdout and stderr) from the executable.
  - If AI retries are enabled, send the failing code, full error output, and a history of previous AI attempts for this diagram to the Gemini AI for correction.
  - If the AI provides a new, valid code suggestion, save it to the .mmd file and re-attempt PNG generation.
  - Repeat the AI-fix and render attempt cycle up to the maximum configured retries.
  - If AI fixes fail or retries are exhausted, the original block is retained in the HTML, and the .mmd file is reverted to the original code if AI modifications were made.
Save the modified HTML content to an output file.
Provide detailed logging to a file and summary status/errors to the console.

Command-Line Interface (CLI)

The script accepts the following command-line arguments:

-i, --input-html FILE_PATH ( Required ): Path to the input HTML file.
-o, --output-html FILE_PATH ( Required ): Path to save the modified HTML file.
-d, --image-dir DIRECTORY_PATH ( Required ): Directory to store generated PNGs and intermediate .mmd files.
--executable EXECUTABLE_PATH ( Required ): Path to the external executable (e.g., mmdc ) that converts .mmd to .png .
--executable-args [ARG ...] (Optional): A list of arguments to pass to the external executable. Placeholders {input_file} , {output_dir} , and {base_name} will be substituted.
- Example: --executable-args -i {input_file} -o {output_dir}/{base_name}.png -w 1024
- If not provided, the script assumes the executable takes the input .mmd file path as its sole primary argument and outputs a .png file (with the same base name as the input) in the directory specified by -d (which is also the CWD for the executable).
--log-file FILE_PATH ( Required ): Path for the detailed log file.
--log-level {INFO,DEBUG} (Optional, Default: INFO ): Console logging detail level. The log file always captures DEBUG level.
--ai-retries INTEGER (Optional, Default: 0 , Range: 0-10 ): Maximum number of attempts for the AI to fix a broken Mermaid diagram. 0 disables AI fixing. Requires GOOGLE_API_KEY environment variable to be set.
--image-src-prefix PREFIX_STRING (Optional, Default: ""): Prefix for the src attribute of generated tags.
- Can be a relative path (e.g., images/ ) or an absolute URL (e.g., https://cdn.example.com/ ).
- If empty, the script calculates a relative path from the output HTML file's directory to the image.

Environment Variable: GOOGLE_API_KEY must be set if --ai-retries is greater than 0 for AI functionality.

AI Model Note: The script currently defaults to using the gemini-1.5-pro-latest model. The target model is gemini-2.5-pro-preview-05-06 , which can be configured in the script if available and preferred.

Detailed Workflow

1. Setup & Configuration

Parse command-line arguments.
Initialize logging:
- File logger (always DEBUG level).
- Console logger ( INFO or DEBUG as per --log-level ).
Verify GOOGLE_API_KEY if AI retries are enabled.
Create the image output directory ( -d ) if it doesn't exist.

2. HTML Parsing & Diagram Extraction

Read the content of the input HTML file ( -i ).
Parse the HTML using BeautifulSoup.
Find all elements with the class mermaid .
For each found :
1. Extract its text content (the Mermaid diagram code). Skip if empty.
2. Attempt to find a caption from an immediately following tag.
3. Generate a unique, filesystem-safe base name for the diagram (e.g., sanitized_caption_1 or diagram_1 ) using the caption or a default naming scheme.

3. Processing Each Diagram (Iterative Loop)

For each extracted Mermaid diagram and its generated base name:

Prepare Files:
- Define .mmd filename (e.g., {base_name}.mmd ) and .png filename (e.g., {base_name}.png ) in the image output directory.
- Set current_mermaid_code to the original extracted code.
Render Attempt Loop (includes initial attempt and AI retries):
1. Save to .mmd : Write the current_mermaid_code (which might be original or AI-modified) to its .mmd file.
2. Execute External Tool:
  - Delete any pre-existing .png file for this diagram to ensure a fresh generation check.
  - Construct the command using --executable and processed --executable DEBUG level and above.
  - --ai-retries (Optional, Integer, Range: 0-10, Default: 0): Maximum number of times to ask the AI to fix a broken Mermaid diagram. 0 disables AI intervention.
  - --image-src-prefix (Optional, String, Default: ""): A prefix for the src attribute of generated tags.
    - If empty, the script calculates a relative path from the output HTML file to the image.
    - Can be a relative path (e.g., "images/" ) or an absolute URL (e.g., "https://cdn.example.com/diagrams/" ).
  AI Model: The script currently uses gemini-1.5-pro-latest (defined by AI_MODEL_NAME internally). While the initial request specified gemini-2.5-pro-preview-05-06 , a more generally available model is used for broader compatibility. This can be adjusted in the script if the specific preview model is accessible and preferred.
  
  API Key: For AI functionality (if --ai-retries > 0 ), the GOOGLE_API_KEY environment variable must be set. If not set, AI retries will be disabled even if specified.
  
  3.2. Logging Implementation
  - Dual logging:
    - Detailed logs (DEBUG level and above), including timestamps, function names, verbose messages, full AI prompts, and raw AI responses, written to the file specified by --log-file .
    - Summarized status updates, major errors, and AI suggestions (when new) printed to the console, with verbosity controlled by --log-level .
  3.3. HTML Parsing & Diagram Extraction
  1. Read the content of the input HTML file specified by --input-html .
  2. Parse the HTML content using the BeautifulSoup library.
  3. Find all elements with the class mermaid . These elements are assumed to directly contain the Mermaid diagram code as their text content.
  4. For each found Mermaid :
    1. Extract its raw text content, stripping leading/trailing whitespace. This is the Mermaid diagram code.
    2. Attempt to find a descriptive caption: Look for a tag with the class diagram-caption that immediately follows the . Use its text content as the diagram name idea.
    3. If no caption is found, or if the caption is empty, generate a default name (e.g., "diagram_1", "diagram_2", using a counter for uniqueness).
    4. Sanitize the diagram name idea to make it safe for use in filenames (remove special characters, replace spaces with underscores).
    5. Ensure the final base filename (before .mmd or .png extension) is unique within the current script run, appending counters if necessary (e.g., "my_diagram_1", "my_diagram_2").
  3.4. Processing Each Diagram (Loop)
  
  For each extracted Mermaid diagram and its generated base name:
  1. Save to .mmd file:
    - Save the current Mermaid code (either original or AI-modified) to a file named {base_name}.mmd in the directory specified by --image-dir .
  2. Generate PNG Image (Initial Attempt or Retry):
    1. Construct the full command to run the external executable:
      - Use the path from --executable .
      - Incorporate arguments from --executable-args , replacing placeholders ( {input_file} , {output_dir} , {base_name} ) with their-args (or default behavior if args not provided).
      - Run the command, capturing its exit code, standard output (stdout), and standard error (stderr).
      - Log the full command executed at DEBUG level.
    2. Check for Success:
      - Condition 1: Executable exit code is 0 (success).
      - Condition 2: The expected .png file exists in the image output directory.
    3. If PNG Generation Successful (both conditions met):
      - Log success.
      - If this was an AI-assisted success, log that too.
      - Mark diagram as successfully rendered and break from this render attempt loop.
    4. If PNG Generation Fails:
      - Log the failure, including executable's return code, and a summary of why (e.g., non-zero exit, PNG not found).
      - Combine the complete stdout and stderr from the executable into a single last_error_message_from_executable string. Log this detailed error.
      - Check if AI retries ( max_ai_retries ) are exhausted for this diagram OR if AI retries are disabled ( max_ai_retries == 0 ).
        
        If yes, log final failure for this diagram, ensure the .mmd file is reverted to original_mermaid_code if AI had modified it, and break from this render attempt loop.
      - AI Fixing Attempt (if retries remain and are enabled):
        
        Log the AI retry attempt number.
        
        Construct a prompt for the Gemini AI, including:
        
        Clear instructions to fix Mermaid syntax and return only code.
        
        The history of previous AI attempts for this specific diagram (previous AI-suggested code and the error it produced).
        
        The currently failing current_mermaid_code .
        
        The complete last_error_message_from_executable .
        
        Log the full prompt sent to the AI at DEBUG level in the log file.
        
        Call the Gemini API.
        
        Extract the corrected Mermaid code from the AI's response (stripping any extraneous text or markdown).
        
        Log the raw AI response at DEBUG level.
        
        If the AI provides a new, different , and valid-looking code suggestion:
        
        Log the AI's suggested code to the console ( INFO ) and log file ( DEBUG ).
        
        Update current_mermaid_code with the AI's suggestion.
        
        Add the (code that failed, error for that code, AI suggestion) to ai_attempts_history .
        
        Continue to the next iteration of the render attempt loop (back to step 3.b.i - Save to .mmd ). This counts as one AI retry.
        
        If AI fails to provide a new/different/useful suggestion:
        
        Log this outcome.
        
        Ensure the .mmd file is reverted to original_mermaid_code .
        
        Break from this render attempt loop (this diagram failed).
        actual values for the current diagram.
        
        If --executable-args is not provided, assume the command is .
    5. Execute the command using subprocess.run() . The script must wait for the command to complete.
    6. Capture stdout , stderr , and the exit code from the executable.
    7. Check for Success:
      - The executable must exit with a code of 0.
      - The expected PNG file ( {image_dir}/{base_name}.png ) must exist after the command finishes.
  3. If PNG Generation is Successful:
    - Mark the diagram as successfully rendered.
    - Proceed to HTML Modification (Step 3.4.e).
  4. If PNG Generation Fails:
    - Log the failure, including the executable's exit code, full stdout , and full stderr .
    - If --ai-retries is greater than 0 and the GOOGLE_API_KEY is set, initiate the AI-Assisted Correction Loop. Otherwise, mark the diagram as failed and proceed to the next diagram, leaving the original in the HTML.
  3.4.1. AI-Assisted Correction Loop
  
  This loop is entered if the initial PNG generation fails and AI retries are enabled. It repeats up to --ai-retries times for the current diagram.
```
sequenceDiagram
    participant Script
    participant ExternalTool as External Diagram Tool
    participant GeminiAI as Gemini AI

    Note over Script: PNG generation failed for current Mermaid code.
    Script->>Script: Prepare error message (stdout + stderr from ExternalTool).
    Script->>Script: Construct prompt for AI (instructions, failing code, error message, AI attempt history for this diagram).
    Script->>GeminiAI: Call API with prompt.
    GeminiAI-->>Script: Return suggested Mermaid code / No suggestion / Error.
    
    alt AI provides new, different, valid code
        Script->>Script: Log AI suggestion (console & file).
        Script->>Script: Overwrite .mmd file with AI's code.
        Script->>Script:         
                                    
```

HTML Modification:

If PNG was successfully generated:
- Create a new HTML tag.
- Set its src attribute:
  - If --image-src-prefix is provided, use it (joined correctly with the PNG filename, handling URL or path cases).
  - Otherwise, calculate a relative path from the output HTML file's directory to the PNG file.
- Set its alt attribute using the diagram's sanitized caption/name.
- In the parsed HTML (BeautifulSoup object), replace the original with this new tag.
- Log the replacement.
If PNG generation ultimately failed:
- Do Add (original_failed_code, error, ai_suggestion) to AI history. Script->>ExternalTool: Re-run with new .mmd file. ExternalTool-->>Script: Return result (success/failure, output). Note over Script: Loop continues or exits based on this result or retry count. else AI fails or no useful/new suggestion Script->>Script: Log AI failure. Script->>Script: Revert .mmd to original code (if modified by AI). Note over Script: Mark diagram as "failed to render," stop retries for this diagram. end

graph TD A[Start Script] --> B(Parse CLI Args); B --> C{Valid Args?}; C --


    stderr

) from the external tool.

Prepare for AI:

Take the current failing Mermaid code (which could be the original or a previously AI-suggested version that also failed).
Take the complete error output ( stdout + stderr ) from the *most recent* failed execution of the external tool.

Send to AI (Gemini):

Construct a detailed prompt for the Gemini model. The prompt must include:
- Clear instructions: "You are an expert in Mermaid.js syntax. Your task is to fix..." and "Respond ONLY with Yes --> D[Initialize Logging & Env]; C -- No --> Z[Exit with Error]; D --> E[Read Input HTML]; E --> F[Parse HTML with BeautifulSoup]; F --> G[Find All 'div.mermaid' elements]; G --> H{Diagrams Found?}; H -- No --> I[Save Unchanged HTML]; H -- Yes --> J[Loop Each Diagram]; I --> Y[Log Summary & End]; subgraph Diagram Loop [J] K[Extract Code & Name]; K the corrected Mermaid code block..."
- Context about the error source: "Analyze the complete error message (this may include stdout and stderr from the rendering tool)..."
- The history of previous AI attempts for *this specific diagram*, showing:
  - The code previously suggested by the AI.
  - The error message that resulted when that AI-suggested code was rendered.
- The current failing Mermaid --> L[Set current_mermaid_code = original]; L --> M[Render Attempt Loop (Max Retries + 1)]; subgraph Render Attempt Loop [M] N[Save current_mermaid_code to .mmd]; N --> O[Run External Executable on .mmd]; O --> P{ code block.
- The complete error message ( stdout + stderr ) from the executable for the current failing code.
Call the Gemini API with this prompt. Log thePNG Generated & Exit Code 0?}; P -- Yes --> Q[Mark Success, Break Render Loop]; full prompt sent to the AI (at DEBUG level in the log file).

Get AI Suggestion:

Receive the response from the Gemini API. Log the raw AI response (P -- No --> R[Log Failure, Capture Full Error]; R --> S{AI Retries Left & Enabled?}; S -- No --> T[Mark Final Failure, Revert .mmd if AI used, Break Renderat DEBUG level).
Attempt to extract only the corrected Mermaid code block from the AI's response, stripping any surrounding text, explanations, or markdown formatting (e.g., ```mermaid ... ``` or ``` Loop]; S -- Yes --> U[AI Fix Sub-Process]; U --> V{New AI Suggestion?}; V -- Yes --> W[Update current_mermaid_code = AI suggestion, Log Suggest ... ``` ).

Log AI's Suggestion:

If a valid code block is extracted:
- Log the full suggested code to the log file (DEBUG level).
- If the suggestion is new and different from the code that was just sent forion, Add to History]; W --> N; % Retry Render with new code V -- No --> X[Log AI Failure, Revert .mmd if AI used, Break Render Loop]; end M --> AA fixing, log the suggestion concisely to the console (INFO level).

If AI provides a new, different, and valid code suggestion:

Add the tuple `(code_that_just_failed, error_for_that_code, ai_suggestion)` to the AI attempt history for this diagram.

Overwrite the


        .mmd

file with this new AI-suggested code.

Increment the AI retry counter for this diagram.

mermaid> with ]; AB -- No --> AE[Keep Original ]; end end J -- All Diagrams Processed --> X

Go back to step 3.4.b (Generate PNG Image) to re-run the external executable with the new code. This counts as one AI retry attempt.

F[Save Final (Modified) HTML]; XF --> Y; Z --> Y; If AI fails to provide a useful/different suggestion, or if max AI retries are reached for this diagram:

Log this outcome (e.g., "AI did not provide a new suggestion," "Max AI retries reached").
Mark this diagram as "failed to render."
If the .mmd file was modified by any AI suggestion during the attempts for this diagram, revert its content back to the original Mermaid code extracted from the HTML.
Stop processing this particular diagram; move to the next one.

3.4.2. HTML Modification (Post-Rendering)

If PNG was successfully generated (either on the first try or after AI fixes):
- Create a new HTML tag.
- Set its src attribute:
  - If --image-src-prefix is provided, use it combined with the PNG filename (e.g., {image_src_prefix}/{base_name}.png ). Handle URL joining correctly if the prefix is a URL.
  - If --image-src-prefix is empty, calculate the relative path from the location of the output HTML file to the generated PNG file.
- Set its alt attribute using the diagram's sanitized caption/name.
- In the parsed HTML (the BeautifulSoup object), find the original for this diagram and replace it entirely with the new tag.
If PNG generation ultimately failed for this diagram (after all attempts, including AI if applicable):
- Do NOT modify this part of the HTML. Leave the original block intact.

3.5. Save Final HTML

After attempting to process all found Mermaid diagrams:

Take the (potentially modified) BeautifulSoup HTML object.
Write its content (prettified for readability) to the file path specified by --output-html .

3.6. Summary Logging

At the end of the script execution, log a summary to the console and log file, including:

Total Mermaid s found.
Number of diagrams attempted for processing (non-empty).

Mermaid Diagram Processor - Requirements Document (Current State)

Project Goal

Mermaid Diagram Processor - Requirements Specification

1. Project Goal

2. Overall Process Flow

3. Detailed Step-by-Step Requirements

3.1. Setup & Configuration

Core Functionality Overview

Command-Line Interface (CLI)

Detailed Workflow

1. Setup & Configuration

2. HTML Parsing & Diagram Extraction

3. Processing Each Diagram (Iterative Loop)

3.2. Logging Implementation

3.3. HTML Parsing & Diagram Extraction

3.4. Processing Each Diagram (Loop)

3.4.1. AI-Assisted Correction Loop

4. Save Final HTML

5. Logging and Output Summary

Overall Process Flowchart

3.4.2. HTML Modification (Post-Rendering)

3.5. Save Final HTML

3.6. Summary Logging

AI Interaction Sequence Diagram (Simplified for one AI fix attempt)