DFA Analysis Glossary

Reference guide for terms used in DFA analysis reports

This glossary provides detailed explanations of the technical terms and metrics used throughout the DFA (Document Format Architecture) analysis reports. Understanding these terms will help you interpret analysis results and make informed decisions about code maintenance and optimization.

The DFA analysis system uses a multi-phase approach: structural analysis examines code organization and patterns, semantic analysis evaluates business logic and data flow, and documentation generation produces human-readable explanations with pseudo-code representations.

1. Complexity Metrics

Complexity Level

SIMPLE MEDIUM COMPLEX

An overall assessment of code complexity based on multiple structural factors. This metric provides a quick indication of how difficult a module may be to understand, maintain, or modify.

SIMPLE
Straightforward code with minimal branching, low nesting depth (0-1), and few dependencies. Easy to understand and maintain.
MEDIUM
Moderate complexity with some conditional logic, nesting depth of 2-3, and several dependencies. Requires careful attention during modifications.
COMPLEX
High complexity with extensive branching, deep nesting (4+), multiple loops, and numerous dependencies. May benefit from refactoring.

Nesting Depth

The maximum level of nested control structures (IF/THEN/ELSE, loops, SELECT/CASE) found in the code. Higher nesting depths indicate more complex decision trees and can make code harder to follow and test.

Example
IF condition1 THEN // Depth 1
  IF condition2 THEN // Depth 2
    IF condition3 THEN // Depth 3
      ... // Max nesting = 3
    ENDIF
  ENDIF
ENDIF

Total Conditions

The count of all conditional statements (IF/THEN/ELSE blocks) in a module. A high number of conditions increases the number of possible execution paths, making testing and validation more challenging.

Complexity Drivers

Specific factors that contribute to the overall complexity assessment. These are identified during structural analysis and help pinpoint areas that may need attention.

High condition count
Many IF/THEN/ELSE statements increase decision complexity
Deep nesting
Multiple levels of nested control structures
Multiple loops
Several iteration structures that may interact
Many dependencies
Numerous references to external modules or formats
Complex variable patterns
Dynamic variable assignment or complex data structures

2. Patterns

Control Flow Patterns

Recurring structures that govern how code execution flows through a module. The analysis identifies these patterns to understand the logical structure and decision-making processes within the code.

Conditional Logic (IF/THEN/ELSE)
Decision branching based on conditions. The analysis counts occurrences and identifies the main condition patterns used.
Selection Structures (SELECT/CASE)
Multi-way branching based on variable values. Common for handling different document types or states.
Loops (FOR/WHILE)
Iteration structures for processing collections, repeating sections, or batch operations.

Variable Organization Patterns

How variables are categorized and organized within the code. This classification helps understand the purpose and scope of different data elements.

Pattern Type Description Examples
Multilingual Variables Variables that hold language-specific content, enabling internationalization (i18n) of documents LABEL_IT, LABEL_EN, TEXT_[LANG]
Configuration Variables Variables that control behavior, formatting, or system settings. Often prefixed with ~ or & ~FONT_SIZE, &RUN_ENV, ~MARGIN
Business Data Variables Variables containing actual business data from external sources or calculations CUSTOMER_NAME, INVOICE_TOTAL, DATE_ISSUE

Assignment Patterns

Types of value assignments found in the code. These patterns reveal how data is manipulated and transformed.

String concatenation
Combining strings using the ! operator (e.g., A ! " " ! B)
Arithmetic operations
Mathematical calculations (addition, division, increment)
Substring extraction
Using SUBSTR to extract portions of strings
Dynamic variable creation
Creating variable names at runtime using {expression}=value
Array indexing
Accessing array elements with [index] notation

Architectural Patterns

High-level design approaches identified in the code that reveal the overall architecture and design philosophy.

Localization Strategy
How the code handles multiple languages (variable naming conventions, language selection logic)
Configuration Approach
How settings and parameters are managed (centralized vs. distributed, hardcoded vs. dynamic)
Separation of Concerns
How well the code separates different responsibilities (layout, logic, data)
Modularity Indicators
Evidence of modular design (reusable components, clear interfaces)

3. Dependency Architecture

Coupling Level

LOW MEDIUM HIGH

The degree of interdependence between a module and other parts of the system. Lower coupling generally indicates more maintainable and flexible code.

LOW Coupling
The module has few external dependencies and can be modified or tested in isolation. Changes are unlikely to affect other modules.
MEDIUM Coupling
The module has moderate dependencies on other components. Changes may require coordination with related modules.
HIGH Coupling
The module is heavily dependent on other components. Changes can have ripple effects across the system. Consider refactoring to reduce dependencies.

External References

Links from a module to other components in the system. These references create the dependency graph and affect how changes propagate through the codebase.

Reference Type Description
Format Dependencies References to DOCFORMAT modules. These are document layout definitions that this module uses or includes.
FormatGroup Dependencies References to FORMATGROUP modules. These define page structures, sheets, and logical page layouts.
Include Dependencies References to INCLUDE modules. These are reusable code blocks that are imported into the module.

Dependency Complexity

SIMPLE MEDIUM COMPLEX

An assessment of how complex the dependency graph is for a module. This considers not just the number of dependencies, but also their depth and interconnection.

SIMPLE
Linear or flat dependency structure with few references
MEDIUM
Moderate branching in dependencies with some shared components
COMPLEX
Deep dependency trees, circular references, or many interconnected modules

4. Module Types

DOCFORMAT

DocFormat

Document Format definitions that specify how content is laid out and formatted within a document. DOCFORMATs contain the actual document structure, text placement, variable references, and formatting instructions.

Primary Purpose
Define the visual layout and content arrangement of documents or document sections
Common Contents
Text blocks, variable substitutions, conditional content, formatting rules, positioning instructions
Naming Convention
Typically prefixed with DF_ (e.g., DF_HEADER, DF_INVOICE_BODY)

FORMATGROUP

FormatGroup

Format Groups define the physical page structure including sheets, logical pages, and how DOCFORMATs are arranged. They act as containers that organize multiple formats into a cohesive document structure.

Primary Purpose
Define page layout, sheet dimensions, and the relationship between logical and physical pages
Common Contents
Sheet definitions, logical page layouts, header/footer references, page orientation settings
Naming Convention
Typically prefixed with FG_ (e.g., FG_SIMPLEX_PORTRAIT, FG_DUPLEX_LANDSCAPE)

INCLUDE

Include

Reusable code modules that can be incorporated into other modules. INCLUDEs promote code reuse by centralizing common functionality, variable definitions, or initialization routines.

Primary Purpose
Provide reusable code blocks that can be shared across multiple documents or modules
Common Contents
Variable initializations, common functions, shared calculations, configuration settings
Usage Pattern
Referenced via INCLUDE statements in DOCFORMATs or other modules

5. Analysis Outputs

Business Purpose

A clear, non-technical description of what the module accomplishes from a business perspective. This helps stakeholders understand the role of each module without needing to read the code.

Example
"Generates customer invoice letters with itemized charges, payment due dates, and company contact information for the billing department."

Workflow Role

The module's position and function within the overall document generation pipeline. This describes how the module interacts with other components and when it is invoked.

Example
"Called by the main document controller after data loading. Formats the body section before footer generation."

Data Flow

A description of how data moves through the module, from input to output. This includes data sources, transformations applied, and where results are stored or passed.

Example
Input: Customer data array [I] → Length Calculation → Loop Processing → String Concatenation → Output: Formatted address block

Business Rules

Specific business logic constraints and requirements implemented in the code. These rules govern how data is processed, validated, or formatted according to business requirements.

Examples
• Invoice amounts must be formatted with 2 decimal places
• Customer names exceeding 40 characters must be truncated
• Payment due date must be at least 30 days from issue date

Integration Points

Specific locations where the module connects with other components, external systems, or data sources. Understanding integration points is crucial for impact analysis and testing.

Consumer
This module uses/calls the referenced component
Provider
This module provides functionality used by others
Source
External data source (file, database, system variable)

Pseudo-code

A simplified, human-readable algorithmic representation of the module's logic. Pseudo-code abstracts away DFA syntax details while preserving the essential logic flow, making it easier for non-DFA developers to understand the code.

Format
// MODULE_NAME - Brief description

// 1. First step
Initialize variables
  If CONDITION Then
    Perform action
  EndIf

// 2. Second step
Call OTHER_MODULE for specific purpose

// Note: Business context explanation

6. Issues & Recommendations

Potential Issues

Problems or concerns identified during analysis that may affect code quality, maintainability, or reliability. Issues are categorized by priority level.

Priority Description Action Required
HIGH Critical issues that may cause errors, data corruption, or security vulnerabilities Immediate attention recommended
MEDIUM Issues affecting maintainability, performance, or code quality Address in next maintenance cycle
LOW Minor improvements, style issues, or documentation gaps Consider when time permits

Technical Recommendations

Specific suggestions for improving code quality, structure, or maintainability. These are actionable items that developers can implement to enhance the module.

Common Recommendations
• Replace magic numbers with named constants
• Add input validation for external data
• Extract repeated logic into reusable includes
• Add error handling for edge cases
• Reduce nesting depth through early returns

Performance Considerations

Observations about code patterns that may impact execution speed, memory usage, or resource consumption. These help identify optimization opportunities.

Common Performance Issues
• String concatenation inside loops (consider pre-allocation)
• Repeated data structure access (consider caching)
• Unnecessary nested loops (consider restructuring)
• Large variable scope (consider localizing)

Maintenance Notes

Important information for developers who will maintain or modify the code in the future. These notes highlight non-obvious behaviors, dependencies, or historical context.

What's Included
• External variable dependencies (variables defined elsewhere)
• Legacy code patterns that may seem unusual
• Implicit assumptions about input data
• Known limitations or workarounds
• Version-specific behaviors

7. Quality Assessment

Quality Score

A composite score from 0 to 100 that evaluates the quality of LLM-generated analysis for each section. It combines four weighted dimensions: Depth (25%), Fidelity (30%), Specificity (20%), and Completeness (25%). Higher scores indicate more thorough, accurate, and detailed analysis outputs.

Quality Grade

A B C D F

A letter grade derived from the quality score, providing a quick visual indicator of analysis quality.

A (>= 90)
Excellent analysis quality. Deep, accurate, specific, and complete.
B (>= 75)
Good analysis quality with minor gaps or generalities.
C (>= 60)
Adequate analysis. May lack depth or contain some generic descriptions.
D (>= 40)
Below average. Significant gaps in analysis depth or accuracy.
F (< 40)
Poor quality. Analysis is shallow, generic, or largely incomplete.

Depth (25%)

Measures the depth of the analysis relative to the complexity of the source code. Evaluates the ratio of pseudo-code length to source code, description detail, and business logic coverage. A shallow analysis of complex code scores low, while a thorough analysis of simple code scores high.

Fidelity (30%)

Measures factual accuracy of the analysis based on post-validation confidence. Cross-references LLM outputs against the actual source code to verify dependencies, variables, and complexity claims. The highest-weighted dimension because accuracy is critical for documentation trustworthiness.

Specificity (20%)

Detects the absence of generic or boilerplate descriptions. Scans analysis outputs for common generic patterns (e.g., "standard implementation", "typical approach") and penalizes their presence. Higher specificity means the analysis provides concrete, module-specific insights rather than generic observations.

Completeness (25%)

Measures the percentage of analysis fields that are substantively filled. Checks 9 key fields: description, primary_function, business_logic, pseudo_code, data_flow, dependencies, potential_issues, recommendations, and integration_points. Empty or placeholder values reduce the score.

Quality Flags

Automatic alerts triggered when specific quality thresholds are not met. These flags help quickly identify sections that may need re-analysis or manual review.

shallow_analysis
Depth score below 40%. The analysis is too brief relative to the code complexity.
generic_description
Specificity score below 40%. The analysis contains too many generic or boilerplate descriptions.
many_missing_fields
Completeness score below 50%. Too many analysis fields are empty or contain placeholder values.
low_factual_confidence
Fidelity score below 50%. Post-validation found significant discrepancies between the analysis and the source code.

Coherence Score

A cross-section coherence score (0-100%) that evaluates consistency across all analyzed sections. Checks for orphan dependencies (references to non-existent sections), missing back-references, and include consistency. A high coherence score indicates that cross-section relationships are accurately documented.