vllm.config.profiler ¶
ProfilerConfig ¶
Dataclass which contains profiler config for the engine.
Source code in vllm/config/profiler.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 | |
delay_iterations class-attribute instance-attribute ¶
delay_iterations: int = Field(default=0, ge=0)
Number of engine iterations to skip before starting profiling. Defaults to 0, meaning profiling starts immediately after receiving /start_profile.
ignore_frontend class-attribute instance-attribute ¶
ignore_frontend: bool = False
If True, disables the front-end profiling of AsyncLLM when using the 'torch' profiler. This is needed to reduce overhead when using delay/limit options, since the front-end profiling does not track iterations and will capture the entire range.
max_iterations class-attribute instance-attribute ¶
max_iterations: int = Field(default=0, ge=0)
Maximum number of engine iterations to profile after starting profiling. Defaults to 0, meaning no limit.
profiler class-attribute instance-attribute ¶
profiler: ProfilerKind | None = None
Which profiler to use. Defaults to None. Options are:
-
'torch': Use PyTorch profiler.
-
'cuda': Use CUDA profiler.
torch_profiler_dir class-attribute instance-attribute ¶
torch_profiler_dir: str = ''
Directory to save torch profiler traces. Both AsyncLLM's CPU traces and worker's traces (CPU & GPU) will be saved under this directory. Note that it must be an absolute path.
torch_profiler_dump_cuda_time_total class-attribute instance-attribute ¶
torch_profiler_dump_cuda_time_total: bool = True
If True, dumps total CUDA time in torch profiler traces. Enabled by default.
torch_profiler_record_shapes class-attribute instance-attribute ¶
torch_profiler_record_shapes: bool = False
If True, records tensor shapes in the torch profiler. Disabled by default.
torch_profiler_use_gzip class-attribute instance-attribute ¶
torch_profiler_use_gzip: bool = True
If True, saves torch profiler traces in gzip format. Enabled by default
torch_profiler_with_flops class-attribute instance-attribute ¶
torch_profiler_with_flops: bool = False
If True, enables FLOPS counting in the torch profiler. Disabled by default.
torch_profiler_with_memory class-attribute instance-attribute ¶
torch_profiler_with_memory: bool = False
If True, enables memory profiling in the torch profiler. Disabled by default.
torch_profiler_with_stack class-attribute instance-attribute ¶
torch_profiler_with_stack: bool = True
If True, enables stack tracing in the torch profiler. Enabled by default.
_get_from_env_if_set ¶
Get field from env var if set, with deprecation warning.
Source code in vllm/config/profiler.py
_set_from_env_if_set ¶
_set_from_env_if_set(
field_name: str,
env_var_name: str,
to_bool: bool = True,
to_int: bool = False,
) -> None
Set field from env var if set, with deprecation warning.
Source code in vllm/config/profiler.py
_validate_profiler_config ¶
_validate_profiler_config() -> Self
Source code in vllm/config/profiler.py
124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 | |
compute_hash ¶
compute_hash() -> str
WARNING: Whenever a new field is added to this config, ensure that it is included in the factors list if it affects the computation graph.
Provide a hash that uniquely identifies all the configs that affect the structure of the computation graph from input ids/embeddings to the final hidden states, excluding anything before input ids/embeddings and after the final hidden states.