2

I'm working on a project where I want to generate a set of crypto wallet seed phrases from an existing seed phrase. The reason for this is so that using just the original seed phrase the wallet holder can access multiple connected accounts. My approach involves combining the original seed phrase with a deterministically derived salt. Specifically, the process is as follows:

  1. Take the original seed phrase.
  2. Generate a salt by hashing (SHA-256) the seed phrase with the first word of the seed phrase.
  3. Derive the first new seed phrase using crypto.subtle.deriveBits to get the entropy of the existing key with the generated salt, and passing it to bip39.entropyToMnemonic.
  4. Repeat the process, each time adding the next word of the original seed phrase to the hash input to create a new salt and derive a new seed phrase.

For example:

  • Hash(seed + first word) → Salt1
  • seed + Salt1 -> seed1
  • Hash(seed + first word + second word) → Salt2
  • seed + Salt2 -> seed2

While I understand that if an attacker obtains the original seed phrase, they could potentially generate all derived seed phrases if they guess the algorithm, I am concerned about other security risks associated with this approach.

Any insights or recommendations on how to address these risks and improve the security of this approach would be greatly appreciated. Thank you!

2
  • welcome - please clarify which hash function you're planning on using? also, how do you go from having a salt_n to the new seed phrase? are the words coming from a common word-list (bip39)??
    – brynk
    Commented Jul 8 at 12:27
  • Thanks! Added some details, planning on SHA-256 for hashing and using bip39 entropyToMnemonic to derive the new seed phrase
    – jgy
    Commented Jul 9 at 1:37

2 Answers 2

2

This scheme has several major issues.

First off, don't roll your own, i.e., don't try to invent your own cryptographic schemes. There are many ideas which look good on paper but turn out to be fatally flawed. Note that even experts make mistakes, so the only valid schemes are those which have gone through years of peer reviews, practical applications and revisions.

Secondly, you talk about how the attacker first has to “guess” the algorithm before they can “potentially” derive all seeds. Trying to keep the algorithm secret is security through obscurity and a well-known anti-pattern in cryptography. You should always assume that attackers know the algorithm. So in your case, they can in fact derive all seeds when they know the initial seed.

Then there's a misunderstanding of what a salt is. The purpose of salts is that they are an additional parameter besides the hash input, usually random byte sequences. Your “deterministic salts” aren't salts at all – the entire scheme only depends on the hash input and nothing else. You can see that your scheme is fundamentally broken when you imagine that it's applied to general passphrases, not just seed passphrases (which are designed to have very high entropy). An attacker could precalculate the results of your scheme for many possible passphrases and create a lookup table to speed up future attacks – this is exactly what salts are supposed to prevent.

Last but not least, a lot of design choices look completely arbitrary. For example, your “salts” are generated by taking the seed phrase, repeating the first n words of the same phrase and then hashing this string – it's entirely unclear what this is supposed to achieve.

Instead of trying to invent your own cryptographic schemes, use standard solutions. In your case, you want a key derivation which takes high-entropy input (the initial seed passphrase) and then expands it. A suitable choice would be HKDF which is available in the SubtleCrypto API.

As pseudo-code:

parameters:
  initial_seed_passphrase
  derived_seed_passphrases_count

hkdf_salt = generate_random_bytes(16)

# calculate required length of HKDF output
# I'm assuming 128 bits of initial entropy per seed passphrase
entropy_length_for_derived_seed_passphrases = derived_seed_passphrases_count * 128

# calculate initial entropy for derived seed passphrases with HKDF
entropy_for_derived_seed_passphrases = HKDF(
  input: initial_seed_passphrase,
  hash: "SHA-256", 
  salt: hkdf_salt,
  length: entropy_length_for_derived_seed_passphrases
)

derived_seed_passphrases = []
for i from 0 to derived_seed_passphrases_count - 1:
  # use 128-bit slice of the HKDF result as initial entropy 
  derived_seed_entropy = entropy_for_derived_seed_passphrases[16 * i, 16]
  derived_seed_passphrase = standard_bip39_procedure(derived_seed_entropy)
  derived_seed_passphrases += [derived_seed_passphrase]

Note that the random salt isn't secret and can therefore be stored as plaintext in arbitrary locations.

Using PBKDF2 instead of HKDF wouldn't make sense, because this algorithm is meant for low entropy input like user-chosen passwords and therefore has a cost factor to slow down brute-force attacks. This doesn't apply here. The seed passphrase is high-entropy input (at least 128 bits).

1
  • Wow this is exactly what I'm looking for, thank you! Glad I asked 😅
    – jgy
    Commented Jul 11 at 0:00
1

The scheme as presented seems more risky to me than it needs to be, because words from the mnemonic phrase are being fed back into a hash-function, and not a password-based key derivation function (pwd-kdf).

Under normal circumstances deriving the key for the wallet would see the mnemonic secret phrase sent to pbkdf2 to obtain seed. But, if any pair of salt_n hash-digests "somehow" leak, my understanding is they could then be used as an oracle for finding words that produced seed (because they are the same words, and only n-iterations of SHA2 now separate them).

  • Salt0 <- H(seed)
  • Salt1 <- H(seed).update(word1)
  • SaltN <- H(seed).update(word1)...update(wordN)

How the hash-digests might leak is left to the imagination, but I just want to point out that this possible interaction now exists. The further apart these salts are, the harder it would be to compute... but, in the end it would still be easier than having to do it from the other side of the pwd-kdf!

Instead I'd consider sending some derivation of seed to a pwd-kdf, such as window.crypto.subtle.deriveKey.

seed_N <- mnemonic2Seed(
  entropy2Mnemonic(
    pbkdf2_sha512( H("seed-N" + seed) , 545323 , SALT_64BYTE ) 
  )
)
3
  • Using PBKDF2 makes no sense here. Even if “seed passphrase” has the term “passphrase” in it and therefore sounds like a low-entropy password, it has at least 128 bits of entropy according to BIP-0039. So there’s no reason for increasing the computational costs. A much more suitable choice is HKDF.
    – Ja1024
    Commented Jul 9 at 18:14
  • @Ja1024 i'm comfortable with the choice of seed_n input derivation being based on pbkdf2 in this context (although hkdf is suitable cryptographically speaking, i opted not to use it for other reasons)
    – brynk
    Commented Jul 9 at 22:45
  • What other reasons? The only significant difference between PBKDF2 and HKDF is that PBKDF2 repeats the HMAC calculations to deal with low-entropy input which would otherwise be susceptible to brute-force attacks. When there is no low-entropy input, this extra work serves no useful purpose. Sure, it’s not strictly wrong to make an algorithm do useless work (assuming there is no denial-of-service risk). But it’s still an odd choice.
    – Ja1024
    Commented Jul 10 at 3:59

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .