API Reference

PipelineDebugger

class dataprobe.PipelineDebugger(name: str = 'Pipeline', track_memory: bool = True, track_lineage: bool = True, auto_save: bool = True, save_path: Path | None = None, memory_threshold_mb: float = 100.0)[source]

Bases: object

A comprehensive debugging tool for data pipelines that tracks operations, memory usage, data lineage, and provides visual debugging capabilities.

analyze_dataframe(df: DataFrame | DataFrame, name: str = 'DataFrame')[source]

Analyze a DataFrame and provide detailed statistics.

create_3d_pipeline_visualization(save_path: Path | None = None)[source]

Create an advanced 3D visualization of the pipeline network

export_lineage(format: str = 'json') str | Dict[source]

Export data lineage information.

Parameters:

format – Export format (‘json’ or ‘dict’)

generate_executive_report(save_path: Path | None = None)[source]

Generate an executive-level visual report

generate_report() Dict[str, Any][source]

Generate comprehensive pipeline report

print_summary()[source]

Print a summary of the pipeline execution.

profile_memory(func: Callable) Callable[source]

Decorator for detailed memory profiling of a function.

save_checkpoint()[source]

Save current debugging state to disk.

suggest_optimizations() List[Dict[str, Any]][source]

Analyze the pipeline and suggest optimizations.

track_operation(operation_name: str, **metadata)[source]

Decorator to track an operation in the pipeline.

Parameters:
  • operation_name – Name of the operation

  • **metadata – Additional metadata to store

visualize_pipeline(save_path: Path | None = None)[source]

Create an enterprise-grade, professional dashboard visualization of the pipeline execution. This creates a comprehensive visual report that rivals commercial ETL monitoring tools.

Main Methods

PipelineDebugger.track_operation(operation_name: str, **metadata)[source]

Decorator to track an operation in the pipeline.

Parameters:
  • operation_name – Name of the operation

  • **metadata – Additional metadata to store

PipelineDebugger.profile_memory(func: Callable) Callable[source]

Decorator for detailed memory profiling of a function.

PipelineDebugger.analyze_dataframe(df: DataFrame | DataFrame, name: str = 'DataFrame')[source]

Analyze a DataFrame and provide detailed statistics.

PipelineDebugger.print_summary()[source]

Print a summary of the pipeline execution.

PipelineDebugger.visualize_pipeline(save_path: Path | None = None)[source]

Create an enterprise-grade, professional dashboard visualization of the pipeline execution. This creates a comprehensive visual report that rivals commercial ETL monitoring tools.

PipelineDebugger.generate_report() Dict[str, Any][source]

Generate comprehensive pipeline report

PipelineDebugger.export_lineage(format: str = 'json') str | Dict[source]

Export data lineage information.

Parameters:

format – Export format (‘json’ or ‘dict’)

Classes

class dataprobe.debugger.OperationMetrics(operation_id: str, operation_name: str, start_time: float, end_time: float = 0.0, duration: float = 0.0, memory_before: float = 0.0, memory_after: float = 0.0, memory_delta: float = 0.0, input_shape: ~typing.Tuple | None = None, output_shape: ~typing.Tuple | None = None, error: str | None = None, traceback: str | None = None, parent_id: str | None = None, children_ids: ~typing.List[str] = <factory>, metadata: ~typing.Dict[str, ~typing.Any] = <factory>)[source]

Store metrics for a single operation.

class dataprobe.debugger.DataLineage(data_id: str, source: str, transformations: ~typing.List[~typing.Dict[str, ~typing.Any]] = <factory>, current_shape: ~typing.Tuple | None = None, data_type: str = 'unknown', column_changes: ~typing.List[~typing.Dict[str, ~typing.Any]] = <factory>)[source]

Track data lineage information.

Utilities

dataprobe.utils.setup_logger(name: str = 'dataprobe', log_file: Path | None = None, level: str = 'INFO') Logger[source]

Set up a logger with rich formatting.

Parameters:
  • name – Logger name

  • log_file – Optional log file path

  • level – Logging level

Returns:

Configured logger