Mermaid Diagram Processor - Requirements Document (Current State)

Project Goal

To create a Python script that processes an HTML file, identifies all Mermaid diagram code blocks, and attempts to convert each into a PNG image using a specified external command-line tool. If the initial conversion fails due to errors in the Mermaid code, the script will leverage a Generative AI (currently Google Gemini) to attempt to fix the code. The script will iteratively try to render the AI-corrected code up to a user-specified number of retries. Successfully generated PNGs will replace the original Mermaid your browser supports them or if you view it in an environment that processes Mermaid (like some Markdown editors or dedicated tools). ```html Mermaid Diagram Processor - Requirements Specification

Mermaid Diagram Processor - Requirements Specification

Version: 1.0 (Reflecting current script capabilities as of last update)

Date: July 15, 2024

1. Project Goal

To create a Python script that automates the processing of HTML files containing Mermaid diagram code blocks. The script will:

2. Overall Process Flow


graph TD
    A[Start] --> B{Parse Command-Line Arguments};
    B --> C{Read Input HTML File};
    C --> D{Parse HTML (BeautifulSoup)};
    D --> E{Find All 'div.mermaid' Blocks};
    E -- No Diagrams Found --> F[Save Original HTML to Output];
    E -- Diagrams Found --> G{Loop Through Each Diagram};
    F --> Z[End];
    
    G --> H[Extract Mermaid Code & Determine Name];
    H --> I[Attempt 1: Render PNG with Executable];
    I -- Success --> J[Replace 'div.mermaid' with '' Tag];
    I -- Failure (Exec Error or PNG Not Found) --> K{AI Retries Enabled? (ai_retries > 0 AND API Key Set?)};
    
    K -- No --> L[Log Failure, Keep Original 'div.mermaid'];
    L --> M{More Diagrams?};
    J --> M;

    K -- Yes --> N[Start AI Retry Loop (Max: ai_retries)];
    N --> O{Current AI Attempt < Max Retries?};
    O -- Yes --> P[Prepare Prompt for AI (Code, Error, History)];
    P --> Q[Call Gemini AI API];
    Q --> R{AI Suggests New, Different Code?};
    R -- Yes --> S[Update .mmd with AI Code, Increment AI Attempt];
    S --> I;  // Re-attempt rendering with new code
    R -- No (No Suggestion / Same Code / AI Error) --> T[Log AI Failure, Revert .mmd to Original];
    T --> L; // Mark as failed for this diagram
    O -- No (Max Retries Reached) --> T;

    M -- Yes --> G;
    M -- No --> U[Save Modified HTML to Output File];
    U --> V[Log Processing Summary];
    V --> Z;
            

3. Detailed Step-by-Step Requirements

3.1. Setup & Configuration

The script must accept the following command-line arguments:

Core Functionality Overview

  1. Parse an input HTML file to find
    elements.
  2. For each diagram:
    • Extract the Mermaid code and a descriptive name.
    • Save the code to a temporary .mmd file.
    • Attempt to generate a PNG using a user-specified external executable.
    • If successful, replace the
      with an tag in the HTML.
    • If unsuccessful:
      • Capture the full error output (stdout and stderr) from the executable.
      • If AI retries are enabled, send the failing code, full error output, and a history of previous AI attempts for this diagram to the Gemini AI for correction.
      • If the AI provides a new, valid code suggestion, save it to the .mmd file and re-attempt PNG generation.
      • Repeat the AI-fix and render attempt cycle up to the maximum configured retries.
      • If AI fixes fail or retries are exhausted, the original
        block is retained in the HTML, and the .mmd file is reverted to the original code if AI modifications were made.
  3. Save the modified HTML content to an output file.
  4. Provide detailed logging to a file and summary status/errors to the console.

Command-Line Interface (CLI)

The script accepts the following command-line arguments:

Environment Variable: GOOGLE_API_KEY must be set if --ai-retries is greater than 0 for AI functionality.

AI Model Note: The script currently defaults to using the gemini-1.5-pro-latest model. The target model is gemini-2.5-pro-preview-05-06 , which can be configured in the script if available and preferred.

Detailed Workflow

1. Setup & Configuration

  1. Parse command-line arguments.
  2. Initialize logging:
    • File logger (always DEBUG level).
    • Console logger ( INFO or DEBUG as per --log-level ).
  3. Verify GOOGLE_API_KEY if AI retries are enabled.
  4. Create the image output directory ( -d ) if it doesn't exist.

2. HTML Parsing & Diagram Extraction

  1. Read the content of the input HTML file ( -i ).
  2. Parse the HTML using BeautifulSoup.
  3. Find all
    elements with the class mermaid .
  4. For each found
    :
    1. Extract its text content (the Mermaid diagram code). Skip if empty.
    2. Attempt to find a caption from an immediately following

      tag.
    3. Generate a unique, filesystem-safe base name for the diagram (e.g., sanitized_caption_1 or diagram_1 ) using the caption or a default naming scheme.

3. Processing Each Diagram (Iterative Loop)

For each extracted Mermaid diagram and its generated base name:

  1. Prepare Files:
    • Define .mmd filename (e.g., {base_name}.mmd ) and .png filename (e.g., {base_name}.png ) in the image output directory.
    • Set current_mermaid_code to the original extracted code.
  2. Render Attempt Loop (includes initial attempt and AI retries):
    1. Save to .mmd : Write the current_mermaid_code (which might be original or AI-modified) to its .mmd file.
    2. Execute External Tool:
      • Delete any pre-existing .png file for this diagram to ensure a fresh generation check.
      • Construct the command using --executable and processed --executable DEBUG level and above.
      • --ai-retries (Optional, Integer, Range: 0-10, Default: 0): Maximum number of times to ask the AI to fix a broken Mermaid diagram. 0 disables AI intervention.
      • --image-src-prefix (Optional, String, Default: ""): A prefix for the src attribute of generated tags.
        • If empty, the script calculates a relative path from the output HTML file to the image.
        • Can be a relative path (e.g., "images/" ) or an absolute URL (e.g., "https://cdn.example.com/diagrams/" ).

      AI Model: The script currently uses gemini-1.5-pro-latest (defined by AI_MODEL_NAME internally). While the initial request specified gemini-2.5-pro-preview-05-06 , a more generally available model is used for broader compatibility. This can be adjusted in the script if the specific preview model is accessible and preferred.

      API Key: For AI functionality (if --ai-retries > 0 ), the GOOGLE_API_KEY environment variable must be set. If not set, AI retries will be disabled even if specified.

      3.2. Logging Implementation

      • Dual logging:
        • Detailed logs (DEBUG level and above), including timestamps, function names, verbose messages, full AI prompts, and raw AI responses, written to the file specified by --log-file .
        • Summarized status updates, major errors, and AI suggestions (when new) printed to the console, with verbosity controlled by --log-level .

      3.3. HTML Parsing & Diagram Extraction

      1. Read the content of the input HTML file specified by --input-html .
      2. Parse the HTML content using the BeautifulSoup library.
      3. Find all
        elements with the class mermaid . These elements are assumed to directly contain the Mermaid diagram code as their text content.
      4. For each found Mermaid
        :
        1. Extract its raw text content, stripping leading/trailing whitespace. This is the Mermaid diagram code.
        2. Attempt to find a descriptive caption: Look for a

          tag with the class diagram-caption that immediately follows the
          . Use its text content as the diagram name idea.
        3. If no caption is found, or if the caption is empty, generate a default name (e.g., "diagram_1", "diagram_2", using a counter for uniqueness).
        4. Sanitize the diagram name idea to make it safe for use in filenames (remove special characters, replace spaces with underscores).
        5. Ensure the final base filename (before .mmd or .png extension) is unique within the current script run, appending counters if necessary (e.g., "my_diagram_1", "my_diagram_2").

      3.4. Processing Each Diagram (Loop)

      For each extracted Mermaid diagram and its generated base name:

      1. Save to .mmd file:
        • Save the current Mermaid code (either original or AI-modified) to a file named {base_name}.mmd in the directory specified by --image-dir .
      2. Generate PNG Image (Initial Attempt or Retry):
        1. Construct the full command to run the external executable:
          • Use the path from --executable .
          • Incorporate arguments from --executable-args , replacing placeholders ( {input_file} , {output_dir} , {base_name} ) with their-args (or default behavior if args not provided).
          • Run the command, capturing its exit code, standard output (stdout), and standard error (stderr).
          • Log the full command executed at DEBUG level.
        2. Check for Success:
          • Condition 1: Executable exit code is 0 (success).
          • Condition 2: The expected .png file exists in the image output directory.
        3. If PNG Generation Successful (both conditions met):
          • Log success.
          • If this was an AI-assisted success, log that too.
          • Mark diagram as successfully rendered and break from this render attempt loop.
        4. If PNG Generation Fails:
          • Log the failure, including executable's return code, and a summary of why (e.g., non-zero exit, PNG not found).
          • Combine the complete stdout and stderr from the executable into a single last_error_message_from_executable string. Log this detailed error.
          • Check if AI retries ( max_ai_retries ) are exhausted for this diagram OR if AI retries are disabled ( max_ai_retries == 0 ).
            • If yes, log final failure for this diagram, ensure the .mmd file is reverted to original_mermaid_code if AI had modified it, and break from this render attempt loop.
          • AI Fixing Attempt (if retries remain and are enabled):
            1. Log the AI retry attempt number.
            2. Construct a prompt for the Gemini AI, including:
              • Clear instructions to fix Mermaid syntax and return only code.
              • The history of previous AI attempts for this specific diagram (previous AI-suggested code and the error it produced).
              • The currently failing current_mermaid_code .
              • The complete last_error_message_from_executable .
            3. Log the full prompt sent to the AI at DEBUG level in the log file.
            4. Call the Gemini API.
            5. Extract the corrected Mermaid code from the AI's response (stripping any extraneous text or markdown).
            6. Log the raw AI response at DEBUG level.
            7. If the AI provides a new, different , and valid-looking code suggestion:
              • Log the AI's suggested code to the console ( INFO ) and log file ( DEBUG ).
              • Update current_mermaid_code with the AI's suggestion.
              • Add the (code that failed, error for that code, AI suggestion) to ai_attempts_history .
              • Continue to the next iteration of the render attempt loop (back to step 3.b.i - Save to .mmd ). This counts as one AI retry.
            8. If AI fails to provide a new/different/useful suggestion:
              • Log this outcome.
              • Ensure the .mmd file is reverted to original_mermaid_code .
              • Break from this render attempt loop (this diagram failed).
              • actual values for the current diagram.
            9. If --executable-args is not provided, assume the command is .
        5. Execute the command using subprocess.run() . The script must wait for the command to complete.
        6. Capture stdout , stderr , and the exit code from the executable.
        7. Check for Success:
          • The executable must exit with a code of 0.
          • The expected PNG file ( {image_dir}/{base_name}.png ) must exist after the command finishes.
      3. If PNG Generation is Successful:
        • Mark the diagram as successfully rendered.
        • Proceed to HTML Modification (Step 3.4.e).
      4. If PNG Generation Fails:
        • Log the failure, including the executable's exit code, full stdout , and full stderr .
        • If --ai-retries is greater than 0 and the GOOGLE_API_KEY is set, initiate the AI-Assisted Correction Loop. Otherwise, mark the diagram as failed and proceed to the next diagram, leaving the original
          in the HTML.

      3.4.1. AI-Assisted Correction Loop

      This loop is entered if the initial PNG generation fails and AI retries are enabled. It repeats up to --ai-retries times for the current diagram.

      
      sequenceDiagram
          participant Script
          participant ExternalTool as External Diagram Tool
          participant GeminiAI as Gemini AI
      
          Note over Script: PNG generation failed for current Mermaid code.
          Script->>Script: Prepare error message (stdout + stderr from ExternalTool).
          Script->>Script: Construct prompt for AI (instructions, failing code, error message, AI attempt history for this diagram).
          Script->>GeminiAI: Call API with prompt.
          GeminiAI-->>Script: Return suggested Mermaid code / No suggestion / Error.
          
          alt AI provides new, different, valid code
              Script->>Script: Log AI suggestion (console & file).
              Script->>Script: Overwrite .mmd file with AI's code.
              Script->>Script:         
                                          
  • HTML Modification:
    1. Log Failure: Log the current failure and the detailed error message (combined stdout and NOT modify this part of the HTML; leave the original
      block.
    2. Log that the original block is being kept.

    4. Save Final HTML

    1. After attempting to process all found Mermaid diagrams, take the (potentially modified) BeautifulSoup HTML object.
    2. Write its content (prettified) to the specified output HTML file ( -o ).

    5. Logging and Output Summary

    1. Throughout the process, detailed logs (timestamps, function names, actions, errors, full AI prompts, raw AI responses, etc.) are written to the log file ( --log-file ) at DEBUG level.
    2. Concise status updates, summaries of successes/failures, and major errors are printed to the console, respecting the --log-level argument.
    3. A final summary of processing (total diagrams found, attempted, successfully rendered, failed) is logged.

    Overall Process Flowchart

    graph TD A[Start Script] --> B(Parse CLI Args); B --> C{Valid Args?}; C -- stderr ) from the external tool.
  • Prepare for AI:
  • Send to AI (Gemini):
  • Get AI Suggestion:
  • Log AI's Suggestion:
  • If AI provides a new, different, and valid code suggestion:
      AC[Create tag]; AC --> AD[Replace
    1. Add the tuple `(code_that_just_failed, error_for_that_code, ai_suggestion)` to the AI attempt history for this diagram.
    2. Overwrite the .mmd file with this new AI-suggested code.
    3. Increment the AI retry counter for this diagram.
    4. mermaid> with ]; AB -- No --> AE[Keep Original ]; end end J -- All Diagrams Processed --> X
    5. Go back to step 3.4.b (Generate PNG Image) to re-run the external executable with the new code. This counts as one AI retry attempt.
  • F[Save Final (Modified) HTML]; XF --> Y; Z --> Y; If AI fails to provide a useful/different suggestion, or if max AI retries are reached for this diagram:
    1. Log this outcome (e.g., "AI did not provide a new suggestion," "Max AI retries reached").
    2. Mark this diagram as "failed to render."
    3. If the .mmd file was modified by any AI suggestion during the attempts for this diagram, revert its content back to the original Mermaid code extracted from the HTML.
    4. Stop processing this particular diagram; move to the next one.
  • 3.4.2. HTML Modification (Post-Rendering)

    1. If PNG was successfully generated (either on the first try or after AI fixes):
      • Create a new HTML tag.
      • Set its src attribute:
        • If --image-src-prefix is provided, use it combined with the PNG filename (e.g., {image_src_prefix}/{base_name}.png ). Handle URL joining correctly if the prefix is a URL.
        • If --image-src-prefix is empty, calculate the relative path from the location of the output HTML file to the generated PNG file.
      • Set its alt attribute using the diagram's sanitized caption/name.
      • In the parsed HTML (the BeautifulSoup object), find the original diagram for this diagram and replace it entirely with the new tag.
    2. If PNG generation ultimately failed for this diagram (after all attempts, including AI if applicable):
      • Do NOT modify this part of the HTML. Leave the original
        block intact.

    3.5. Save Final HTML

    After attempting to process all found Mermaid diagrams:

    1. Take the (potentially modified) BeautifulSoup HTML object.
    2. Write its content (prettified for readability) to the file path specified by --output-html .

    3.6. Summary Logging

    At the end of the script execution, log a summary to the console and log file, including:

    AI Interaction Sequence Diagram (Simplified for one AI fix attempt)

    diagram