Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] The serialized model is larger than the 2GiB limit imposed by the protobuf library #2768

Open
3 tasks done
maximefuchs opened this issue May 16, 2024 · 0 comments
Open
3 tasks done

Comments

@maximefuchs
Copy link

Checklist

  • I have searched related issues but cannot get the expected help.
  • 2. I have read the FAQ documentation but cannot get the expected help.
  • 3. The bug has not been fixed in the latest version.

Describe the bug

I trained a MViT from mmaction2 and would like to deploy the trained model.
However, the following command:

python mmdeploy/tools/deploy.py mmdeploy/configs/mmaction/video-recognition/video-recognition_3d_tensorrt_static-224x224.py work_dirs/test_mvit_sequence/test_mvit_sequence.py  work_dirs/test_mvit_sequence/best_acc_top1_epoch_6.pth  mmpretrain/demo/demo.JPEG --work-dir work_dirs/test_mvit_sequence/output_trt --device cuda --dump-info

Gives the following error:

Process Process-2:
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/maxime/Documents/classification/.venv/lib/python3.10/site-packages/mmdeploy/apis/core/pipeline_manager.py", line 107, in __call__
    ret = func(*args, **kwargs)
  File "/home/maxime/Documents/classification/.venv/lib/python3.10/site-packages/mmdeploy/apis/pytorch2onnx.py", line 98, in torch2onnx
    export(
  File "/home/maxime/Documents/classification/.venv/lib/python3.10/site-packages/mmdeploy/apis/core/pipeline_manager.py", line 356, in _wrap
    return self.call_function(func_name_, *args, **kwargs)
  File "/home/maxime/Documents/classification/.venv/lib/python3.10/site-packages/mmdeploy/apis/core/pipeline_manager.py", line 326, in call_function
    return self.call_function_local(func_name, *args, **kwargs)
  File "/home/maxime/Documents/classification/.venv/lib/python3.10/site-packages/mmdeploy/apis/core/pipeline_manager.py", line 275, in call_function_local
    return pipe_caller(*args, **kwargs)
  File "/home/maxime/Documents/classification/.venv/lib/python3.10/site-packages/mmdeploy/apis/core/pipeline_manager.py", line 107, in __call__
    ret = func(*args, **kwargs)
  File "/home/maxime/Documents/classification/.venv/lib/python3.10/site-packages/mmdeploy/apis/onnx/export.py", line 138, in export
    torch.onnx.export(
  File "/home/maxime/Documents/classification/.venv/lib/python3.10/site-packages/torch/onnx/utils.py", line 516, in export
    _export(
  File "/home/maxime/Documents/classification/.venv/lib/python3.10/site-packages/torch/onnx/utils.py", line 1613, in _export
    graph, params_dict, torch_out = _model_to_graph(
  File "/home/maxime/Documents/classification/.venv/lib/python3.10/site-packages/mmdeploy/apis/onnx/optimizer.py", line 27, in model_to_graph__custom_optimizer
    graph, params_dict, torch_out = ctx.origin_func(*args, **kwargs)
  File "/home/maxime/Documents/classification/.venv/lib/python3.10/site-packages/torch/onnx/utils.py", line 1139, in _model_to_graph
    graph = _optimize_graph(
  File "/home/maxime/Documents/classification/.venv/lib/python3.10/site-packages/torch/onnx/utils.py", line 677, in _optimize_graph
    graph = _C._jit_pass_onnx(graph, operator_export_type)
  File "/home/maxime/Documents/classification/.venv/lib/python3.10/site-packages/torch/onnx/utils.py", line 1957, in _run_symbolic_function
    return symbolic_fn(graph_context, *inputs, **attrs)
  File "/home/maxime/Documents/classification/.venv/lib/python3.10/site-packages/torch/onnx/symbolic_opset9.py", line 7153, in onnx_placeholder
    return torch._C._jit_onnx_convert_pattern_from_subblock(block, node, env)
  File "/home/maxime/Documents/classification/.venv/lib/python3.10/site-packages/torch/onnx/utils.py", line 1957, in _run_symbolic_function
    return symbolic_fn(graph_context, *inputs, **attrs)
  File "/home/maxime/Documents/classification/.venv/lib/python3.10/site-packages/torch/onnx/symbolic_opset11.py", line 236, in index_put
    broadcast_index_shape = g.op("Shape", index)
  File "/home/maxime/Documents/classification/.venv/lib/python3.10/site-packages/torch/onnx/_internal/jit_utils.py", line 87, in op
    return _add_op(self, opname, *raw_args, outputs=outputs, **kwargs)
  File "/home/maxime/Documents/classification/.venv/lib/python3.10/site-packages/torch/onnx/_internal/jit_utils.py", line 246, in _add_op
    node = _create_node(
  File "/home/maxime/Documents/classification/.venv/lib/python3.10/site-packages/torch/onnx/_internal/jit_utils.py", line 307, in _create_node
    _C._jit_pass_onnx_node_shape_type_inference(node, params_dict, opset_version)
RuntimeError: The serialized model is larger than the 2GiB limit imposed by the protobuf library. Therefore the output file must be a file path, so that the ONNX external data can be written to the same directory. Please specify the output file name.
05/16 11:04:22 - mmengine - ERROR - /home/maxime/Documents/classification/.venv/lib/python3.10/site-packages/mmdeploy/apis/core/pipeline_manager.py - pop_mp_output - 80 - `mmdeploy.apis.pytorch2onnx.torch2onnx` with Call id: 0 failed. exit.

Reproduction

This is the config file for the MViT test_mvit_sequence.py

_base_ = [
    "../mmaction2/configs/_base_/models/mvit_small.py",
    "../mmaction2/configs/_base_/default_runtime.py",
]
# dataset settings
classes = (
    "nothing",
    "Liver",
    "artefacts",
    "head",
    "true_negatif",
    "body",
    "other",
    "tail",
)
num_class = len(classes)
dataset_type = "RawframeDataset"
data_root = "/home/maxime/Documents/DATA/dataset_classifier_5fps/"
ann_file_train = "train.txt"
ann_file_val = "val.txt"
ann_file_test = "test.txt"
# hyperparameters
clip_len = 4  # 16 in former (Adrien) model
batch_size = 2
num_workers = 1
num_clips = 1

metainfo = dict(classes=classes)
model = dict(
    backbone=dict(
        arch="base",
        temporal_size=clip_len,
        drop_path_rate=0.3,
    ),
    data_preprocessor=dict(
        type="ActionDataPreprocessor",
        mean=[114.75, 114.75, 114.75],
        std=[57.375, 57.375, 57.375],
        blending=dict(
            type="RandomBatchAugment",
            augments=[
                dict(type="MixupBlending", alpha=0.8, num_classes=num_class),
                dict(type="CutmixBlending", alpha=1, num_classes=num_class),
            ],
        ),
        format_shape="NCTHW",
    ),
    cls_head=dict(num_classes=num_class),
)


train_pipeline = [
    dict(type="SampleFrames", clip_len=clip_len, frame_interval=1, num_clips=num_clips),
    dict(type="RawFrameDecode"),
    dict(type="Resize", scale=(-1, 256)),
    dict(type="RandomResizedCrop"),
    dict(type="Resize", scale=(224, 224), keep_ratio=False),
    dict(type="Flip", flip_ratio=0.5),
    dict(type="FormatShape", input_format="NCTHW"),
    dict(type="PackActionInputs"),
]
val_pipeline = [
    dict(
        type="SampleFrames",
        clip_len=clip_len,
        frame_interval=1,
        num_clips=num_clips,
        test_mode=True,
    ),
    dict(type="RawFrameDecode"),
    dict(type="Resize", scale=(-1, 256)),
    dict(type="CenterCrop", crop_size=224),
    dict(type="FormatShape", input_format="NCTHW"),
    dict(type="PackActionInputs"),
]
test_pipeline = [
    dict(
        type="SampleFrames",
        clip_len=clip_len,
        frame_interval=1,
        num_clips=25,
        test_mode=True,
    ),
    dict(type="RawFrameDecode"),
    dict(type="Resize", scale=(-1, 256)),
    dict(type="TenCrop", crop_size=224),
    dict(type="FormatShape", input_format="NCTHW"),
    dict(type="PackActionInputs"),
]

train_dataloader = dict(
    batch_size=batch_size,
    num_workers=num_workers,
    persistent_workers=True,
    sampler=dict(type="DefaultSampler", shuffle=True),
    dataset=dict(
        type=dataset_type,
        metainfo=metainfo,
        ann_file=data_root + ann_file_train,
        filename_tmpl="img_{:05}.png",  # id of images has to start at 1
        # modality="Flow",
        data_prefix=dict(img=data_root),
        pipeline=train_pipeline,
    ),
)
val_dataloader = dict(
    batch_size=batch_size,
    num_workers=num_workers,
    persistent_workers=True,
    sampler=dict(type="DefaultSampler", shuffle=False),
    dataset=dict(
        type=dataset_type,
        metainfo=metainfo,
        ann_file=data_root + ann_file_val,
        filename_tmpl="img_{:05}.png",  # id of images has to start at 1
        # modality="Flow",
        data_prefix=dict(img=data_root),
        pipeline=val_pipeline,
        test_mode=True,
    ),
)
test_dataloader = dict(
    batch_size=1,
    num_workers=num_workers,
    persistent_workers=True,
    sampler=dict(type="DefaultSampler", shuffle=False),
    dataset=dict(
        type=dataset_type,
        metainfo=metainfo,
        ann_file=data_root + ann_file_test,
        filename_tmpl="img_{:05}.png",  # id of images has to start at 1
        # modality="Flow",
        data_prefix=dict(img=data_root),
        pipeline=test_pipeline,
        test_mode=True,
    ),
)

val_evaluator = dict(type="AccMetric")
test_evaluator = val_evaluator

train_cfg = dict(
    type="EpochBasedTrainLoop", max_epochs=200, val_begin=1, val_interval=1
)
val_cfg = dict(type="ValLoop")
test_cfg = dict(type="TestLoop")

base_lr = 1.6e-3
optim_wrapper = dict(
    optimizer=dict(type="AdamW", lr=base_lr, betas=(0.9, 0.999), weight_decay=0.05),
    paramwise_cfg=dict(norm_decay_mult=0.0, bias_decay_mult=0.0),
    clip_grad=dict(max_norm=1, norm_type=2),
)

param_scheduler = [
    dict(
        type="LinearLR",
        start_factor=0.01,
        by_epoch=True,
        begin=0,
        end=30,
        convert_to_iter_based=True,
    ),
    dict(
        type="CosineAnnealingLR",
        T_max=200,
        eta_min=base_lr / 100,
        by_epoch=True,
        begin=30,
        end=200,
        convert_to_iter_based=True,
    ),
]

default_hooks = dict(
    checkpoint=dict(interval=1, max_keep_ckpts=5), logger=dict(interval=100)
)

# Default setting for scaling LR automatically
#   - `enable` means enable scaling LR automatically
#       or not by default.
#   - `base_batch_size` = (8 GPUs) x (8 samples per GPU).
auto_scale_lr = dict(enable=False, base_batch_size=256)

And this is the config for the deploy video-recognition_3d_tensorrt_static-224x224.py

_base_ = ["./video-recognition_static.py", "../../_base_/backends/tensorrt.py"]

onnx_config = dict(input_shape=[224, 224])

backend_config = dict(
    common_config=dict(max_workspace_size=1 << 30),
    model_inputs=[
        dict(
            input_shapes=dict(
                input=dict(
                    min_shape=[1, 1, 3, 4, 224, 224],
                    opt_shape=[1, 1, 3, 4, 224, 224],
                    max_shape=[1, 1, 3, 4, 224, 224],
                )
            )
        )
    ],
)

Environment

05/16 11:25:12 - mmengine - INFO - 

05/16 11:25:12 - mmengine - INFO - **********Environmental information**********
05/16 11:25:13 - mmengine - INFO - sys.platform: linux
05/16 11:25:13 - mmengine - INFO - Python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
05/16 11:25:13 - mmengine - INFO - CUDA available: True
05/16 11:25:13 - mmengine - INFO - MUSA available: False
05/16 11:25:13 - mmengine - INFO - numpy_random_seed: 2147483648
05/16 11:25:13 - mmengine - INFO - GPU 0: NVIDIA RTX A4000
05/16 11:25:13 - mmengine - INFO - CUDA_HOME: /usr
05/16 11:25:13 - mmengine - INFO - NVCC: Cuda compilation tools, release 11.5, V11.5.119
05/16 11:25:13 - mmengine - INFO - GCC: x86_64-linux-gnu-gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
05/16 11:25:13 - mmengine - INFO - PyTorch: 2.2.0+cu121
05/16 11:25:13 - mmengine - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.3.2 (Git Hash 2dc95a2ad0841e29db8b22fbccaf3e5da7992b01)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 12.1
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  - CuDNN 8.9.2
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.2.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, 

05/16 11:25:13 - mmengine - INFO - TorchVision: 0.17.0+cu121
05/16 11:25:13 - mmengine - INFO - OpenCV: 4.9.0
05/16 11:25:13 - mmengine - INFO - MMEngine: 0.10.4
05/16 11:25:13 - mmengine - INFO - MMCV: 2.2.0
05/16 11:25:13 - mmengine - INFO - MMCV Compiler: GCC 9.3
05/16 11:25:13 - mmengine - INFO - MMCV CUDA Compiler: 12.1
05/16 11:25:13 - mmengine - INFO - MMDeploy: 1.3.1+87395c5
05/16 11:25:13 - mmengine - INFO - 

05/16 11:25:13 - mmengine - INFO - **********Backend information**********
05/16 11:25:13 - mmengine - INFO - tensorrt:    8.5.2.2
05/16 11:25:13 - mmengine - INFO - tensorrt custom ops: Available
05/16 11:25:13 - mmengine - INFO - ONNXRuntime: 1.17.3
05/16 11:25:13 - mmengine - INFO - ONNXRuntime-gpu:     1.17.1
05/16 11:25:13 - mmengine - INFO - ONNXRuntime custom ops:      Available
05/16 11:25:13 - mmengine - INFO - pplnn:       None
05/16 11:25:13 - mmengine - INFO - ncnn:        None
05/16 11:25:13 - mmengine - INFO - snpe:        None
05/16 11:25:13 - mmengine - INFO - openvino:    None
05/16 11:25:13 - mmengine - INFO - torchscript: 2.2.0
05/16 11:25:13 - mmengine - INFO - torchscript custom ops:      NotAvailable
05/16 11:25:13 - mmengine - INFO - rknn-toolkit:        None
05/16 11:25:13 - mmengine - INFO - rknn-toolkit2:       None
05/16 11:25:13 - mmengine - INFO - ascend:      None
05/16 11:25:13 - mmengine - INFO - coreml:      None
05/16 11:25:13 - mmengine - INFO - tvm: None
05/16 11:25:13 - mmengine - INFO - vacc:        None
05/16 11:25:13 - mmengine - INFO - 

05/16 11:25:13 - mmengine - INFO - **********Codebase information**********
05/16 11:25:13 - mmengine - INFO - mmdet:       3.3.0
05/16 11:25:13 - mmengine - INFO - mmseg:       None
05/16 11:25:13 - mmengine - INFO - mmpretrain:  1.2.0
05/16 11:25:13 - mmengine - INFO - mmocr:       None
05/16 11:25:13 - mmengine - INFO - mmagic:      None
05/16 11:25:13 - mmengine - INFO - mmdet3d:     None
05/16 11:25:13 - mmengine - INFO - mmpose:      None
05/16 11:25:13 - mmengine - INFO - mmrotate:    None
05/16 11:25:13 - mmengine - INFO - mmaction:    1.2.0
05/16 11:25:13 - mmengine - INFO - mmrazor:     None
05/16 11:25:13 - mmengine - INFO - mmyolo:      None

Error traceback

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
1 participant