There are two main types of subtitling and several ways of attaching them to videos.
Types of subtitling
As you've noted in your question, sometimes subtitles have sound cues and sometimes they do not. This is because some subtitles are designed for the hearing and some are designed for the deaf.
Subtitles designed for hearing people will not include these sound cues because the hearing people can... well... hear them. Generally these are used when translating subtitles from another language. In the US, at least, this is generally just called "subtitling"
Subtitles designed for deaf people will include these descriptions because they add details that explain why someone reacts to certain things. Because they can't hear the audio cues, they need textual versions. They add depth to the movie watching experience. These are usually subtitles written in the same language as the spoken language in the film. This is a specialized form of subtitling often referred to as "captioning".
"Captions" aim to describe to the deaf and hard of hearing all significant audio content - spoken dialogue and non-speech information such as the identity of speakers and, occasionally, their manner of speaking - along with any significant music or sound effects using words or symbols.
Ways of subtitling
Most good subtitles are done manually and are either stored in a separate file marked with time code queues or they are hard coded into the video. The latter is more often done with films that have scenes in a language different from the bulk of the film or, occasionally, with fan subs but, usually, they're a separate file. If they are a separate file, that is referred to as "closed captioning" and if they are burned in, that is "open captioning".
The term "closed" (versus "open") indicates that the captions are not visible until activated by the viewer, usually via the remote control or menu option. On the other hand, "open", "burned-in", "baked on", or "hard-coded" captions are visible to all viewers.
Creating these files can be done with a variety of available software options but they are painstaking and difficult to make so you should appreciate the people who take the time to make them (assuming they're done well).
Some websites have bots that do subtitling. They use voice recognition to approximate the words being spoken. They're usually really bad. Youtube has this. If the creator of the video doesn't include a subtitle track, they will create one for you.
YouTube is constantly improving its speech recognition technology. However, automatic captions might misrepresent the spoken content due to mispronunciations, accents, dialects, or background noise. You should always review automatic captions and edit any parts that haven't been properly transcribed.
As to the timing, unless the dialogue is revelatory, the timing isn't really that important. A half second lead or delay isn't going to annoy people that much, so most of that granularity is simply a "because we can" thing. Multiple second delays or leads are a problem and you can actually usually sync the subtitles better (or edit them otherwise) if you have the software and can separate the files.