Note: I'm approaching this from a purely syntactic perspective. I'm sure that a deeper answer could be given from the semantic perspective. And apologies in advance for the poor formatting, I'm not at a computer right now.
In all type theories I'm aware of, a universe is a type that classifies types. You mention the idea of a type-term distinction. I'm interpreting this as the presence or absence of a judgment of the form A type$A \ type$, separate from x : A$x : A$. This naturally leads to two styles of adding a universe to a type theory: universes ala Russel (Russel universes) and universes ala Tarski (Tarski universes).
A Russel universe is some type U$U$, whose elements are other types. If we have some type ℕ$ℕ$, then we can derive a the judgment ℕ : U$ℕ : U$. The presence of a Russel universe allows us, if we wish, to completely eliminate any separate A type$A \ type$ judgment, replacing it with A : U$A : U$ (assuming we have some type for U$U$ itself). This solution is conceptually straightforward and is used in languages like Coq and Agda.
A Tarski universe is also some type U$U$, but instead of its elements being types, they are instead terms, which we call type codes. Basically, instead of being able to derive ℕ : U$ℕ : U$, we have some new term, which I will call 'ℕ$`ℕ$, for which we can derive the judgment 'ℕ : U$`ℕ : U$. We do this for each type former. Notably, 'ℕ$`ℕ$ is not a type. We cannot derive 'ℕ type$`ℕ \ type$ or zero : 'ℕ$zero : `ℕ$. Instead, we introduce a special syntax which allows us to transform a type code into a type. This syntax is usually called El$El$ (short for element), and allows us to derive that El(a) type$El(a) \ type$ given that a : U$a : U$. In order to make El$El$ useful, we introduce the "obvious" judgmental equalities: for instance that El('ℕ) = ℕ type$El(`ℕ) = ℕ \ type$. If instead we made these judgmental equalities into mere equivalences, where an explicit coercion is required to transform a code into a type, we would have what are called weak Tarski universes. For a language that actually implements weak Tarski universes (though they are essentially hidden by elaboration), check out cooltt: https://github.com/RedPRL/cooltt.
So to answer your questions with all this in mind:
- Yes
- A type is a universe when its elements classify other types in some meaningful way
- I would say that this is wrong, but I do not have enough knowledge of semantics to back up why it must be
- Tarski universes
- Not necessarily, if we have a cumulative hierarchy of universes, we could have a rule classifying a type in any universe level
- Universes exist globally in all type theories with them that I know of