I know you want to focus on the BJT.
For the BJT, the value of \$g_m\$ tells you immediately the highest possible voltage gain you can achieve by placing a resistor into the collector leg (assuming of course that the power supply rail voltages and other circuit details permit it.) And for the BJT, unlike a MOSFET, it's not based on construction and instead depends only on operating circumstances easily controlled by the designer using discrete parts.
When using the term \$g_m\$ for the BJT, it is a parameter that depends only on the dc collector current \$I_\text{C}\$: \$g_m=\frac{I_\text{C}}{V_T}\$. (This excludes the emission coefficient, \$n\$, which is almost always just 1 for BJTs.) \$V_T=\frac{k\: T}{q}\$ is a physical parameter based upon the equi-partition law of energy and the application of large population statistics. \$V_T\$ is not adjustable or controllable except by changing the temperature of the device.
(\$g_m\$ applies in active mode for the BJT, not saturated. This is because the collector operates like a current source (sink for NPN) in active mode, but more like a voltage source in saturation.)
For example, the highest voltage gain from a BJT operating in common emitter mode will be \$A_\text{V}=-g_m\cdot R_\text{C}\$. This can be degenerated by inserting an emitter resistor (and often is.) But that's the highest you can expect from a BJT.
The term is quite useful, though. For example, if you intend on operating the BJT at a quiescent collector current of about \$1\:\text{mA}\$, then you know that your maximum possible voltage gain using a common emitter configuration will be \$\approx 39\cdot R_\text{C}\$, if \$R_\text{C}\$ is expressed in thousands of Ohms (and the voltage supply can handle the drop across \$R_\text{C}\$.)
For BJTs, \$g_m\$ only depends on the DC collector current, \$I_\text{C}\$. \$g_m\$ doesn't care about the construction geometry of the BJT (unlike the MOSFET.) (Except as the base-emitter junction area affects parameters that impact the total collector current -- but it is still about the collector current, even then.) In the BJT in active mode, excepting modifications like the Early Effect, \$I_\text{C}\$ is determined by \$V_\text{BE}\$. By comparison, the \$g_m\$ of a MOSFET depends on \$I_\text{D}\$, \$V_{OV}\$, and the ratio, \$\frac{W}{L}\$. So there are three different formulas used to express \$g_m\$ for MOSFETs. (One probably more comparable as apples to apples with the BJT, than the other two.) The \$g_m\$ of MOSFETs is generally considered to be smaller than for the BJT. (Because \$V_{OV}\$ is so much larger than \$V_T\$.)
You can compute \$g_m\$ for the active mode BJT:
$$\begin{align*}
I_C &= I_{SAT}\cdot\left(e^{V_\text{BE} \over V_T}-1\right)\\\\
\text{D}\left(I_C\right) &= \text{D}\left(I_{SAT}\cdot\left(e^{V_\text{BE} \over V_T}-1\right)\right)\\\\
\text{d}\:I_C &= I_{SAT}\cdot\text{D}\left(e^{V_\text{BE} \over V_T}-1\right)\\\\
\text{d}\:I_C &= I_{SAT}\cdot e^{V_\text{BE} \over V_T}\:\text{D}\left({V_\text{BE} \over V_T}\right)\\\\
\text{d}\:I_C &= I_{SAT}\cdot e^{V_\text{BE} \over V_T}\:{\text{d}\:V_\text{BE} \over V_T}\\\\
\text{d}\:I_C &= {I_{SAT} \over V_T}\cdot e^{V_\text{BE} \over V_T}\:{\text{d}\:V_\text{BE} }\\\\
{\text{d}\:I_C \over \text{d}\:V_\text{BE}} &= {I_{SAT} \over V_T}\cdot e^{V_\text{BE} \over V_T}={I_{SAT}\cdot\: e^{V_\text{BE} \over V_T} \over V_T}
\end{align*}$$
But for almost all possible uses of the BJT:
$$I_C = I_{SAT}\cdot\left(e^{V_\text{BE} \over V_T}-1\right) \approx I_{SAT}\cdot\:e^{V_\text{BE} \over V_T}$$
Therefore:
$$g_m={\text{d}\:I_C \over \text{d}\:V_\text{BE}} \approx {I_\text{C} \over V_T}$$