API Reference
PipelineDebugger
- class dataprobe.PipelineDebugger(name: str = 'Pipeline', track_memory: bool = True, track_lineage: bool = True, auto_save: bool = True, save_path: Path | None = None, memory_threshold_mb: float = 100.0)[source]
Bases:
objectA comprehensive debugging tool for data pipelines that tracks operations, memory usage, data lineage, and provides visual debugging capabilities.
- analyze_dataframe(df: DataFrame | DataFrame, name: str = 'DataFrame')[source]
Analyze a DataFrame and provide detailed statistics.
- create_3d_pipeline_visualization(save_path: Path | None = None)[source]
Create an advanced 3D visualization of the pipeline network
- export_lineage(format: str = 'json') str | Dict[source]
Export data lineage information.
- Parameters:
format – Export format (‘json’ or ‘dict’)
- generate_executive_report(save_path: Path | None = None)[source]
Generate an executive-level visual report
- profile_memory(func: Callable) Callable[source]
Decorator for detailed memory profiling of a function.
- suggest_optimizations() List[Dict[str, Any]][source]
Analyze the pipeline and suggest optimizations.
Main Methods
- PipelineDebugger.track_operation(operation_name: str, **metadata)[source]
Decorator to track an operation in the pipeline.
- Parameters:
operation_name – Name of the operation
**metadata – Additional metadata to store
- PipelineDebugger.profile_memory(func: Callable) Callable[source]
Decorator for detailed memory profiling of a function.
- PipelineDebugger.analyze_dataframe(df: DataFrame | DataFrame, name: str = 'DataFrame')[source]
Analyze a DataFrame and provide detailed statistics.
Classes
- class dataprobe.debugger.OperationMetrics(operation_id: str, operation_name: str, start_time: float, end_time: float = 0.0, duration: float = 0.0, memory_before: float = 0.0, memory_after: float = 0.0, memory_delta: float = 0.0, input_shape: ~typing.Tuple | None = None, output_shape: ~typing.Tuple | None = None, error: str | None = None, traceback: str | None = None, parent_id: str | None = None, children_ids: ~typing.List[str] = <factory>, metadata: ~typing.Dict[str, ~typing.Any] = <factory>)[source]
Store metrics for a single operation.
- class dataprobe.debugger.DataLineage(data_id: str, source: str, transformations: ~typing.List[~typing.Dict[str, ~typing.Any]] = <factory>, current_shape: ~typing.Tuple | None = None, data_type: str = 'unknown', column_changes: ~typing.List[~typing.Dict[str, ~typing.Any]] = <factory>)[source]
Track data lineage information.