1

I often end up with data in excel about file sizes where values may include things like:

0
1K
850K
7.12K
19.2M
43.1G
1012G
28T
1.02P

(ok, the last one not so often, but it's a theoretical possibility, eventually)

I often want to identify the large or small outliers, sort them in size order, or use conditional formatting to help me see which ones are larger than those around them.

My first 2 attempts were:

=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"T","000000000000"),"M","000000"),"K","000"),"G","000000000")
=IFERROR(LOG(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"T","000000000000"),"M","000000"),"K","000"),"G","000000000"),10),0)

(the log effectively converts to "number of digits", and iferror handles the zero case)

While imprecise in terms of 1000 vs 1024, this doesn't matter to me as I'm only interested in spotting outliers and the second performs perfectly for my purposes, but it totally falls over when there is a decimal point present.

I'm ideally looking for something quick and hacky that I can copy/paste (or even better type in) when spending 5 mins interpreting logs or reports for anomalies, I'm never intending to save the sheets I'm using, just extract infromation, be informed by it, then move onto the next job.

Highlighting red if it contains a "G" and purple if it contains a "T" would almost be good enough for me, but there's a little too much diferent between 1G and 990G.

Any good sugestions?

If it actually were robust in terms of support for 1024 multiples, and be able to handle MB as will as M, this would be great, but not necisary for my purposes.

1
  • As an aside, if you save an excel file with the .xlsb extension you might be able to reduce your file size drastically (a 16MB file dropped to 1.9MB). Works on macro-enabled files too
    – gns100
    Commented Apr 8, 2021 at 22:20

1 Answer 1

1

Basic Answer:

There's a more direct way to do this that's also more easily expanded to include larger file sizes (as if that was actually needed when you're already counting in petabytes). If your data starts in A1, the formula would be:

=IFERROR(VALUE(A1),VALUE(LEFT(A1,LEN(A1)-1)))*1000^IFERROR(SEARCH(RIGHT(A1),"KMGTPEZY"),0)

There are two big pieces of this:

IFERROR(VALUE(A1),VALUE(LEFT(A1,LEN(A1)-1)))

The first extracts the number part of the string by trying a direct conversion from text to number and, if that fails, it drops the right-most character and tries again. This will work so long as your text is always a valid number followed by either 0 or 1 non-numeric character.

1000^IFERROR(SEARCH(RIGHT(A1),"KMGTPEZY"),0)

The second half lets you multiple that extracted number by some power of 10 based on the last character. It does this by extracting the right most character RIGHT(A1) and looking it up in a list SEARCH(~,"KMGTPEZY"). If it can't find the character in that list, it defaults to zero IFERROR(~,0). It then raises 1000 to that power 1000^~. If the right-most character is K, this will be 1000^1 = 1,000. For G, this will be 10^3 = 1,000,000,000. If the value is in bytes, then the right-most character will be a number and, therefore, won't be in the list so, thanks to the IFERROR, it will be 1000^0 = 1.


1,000 vs. 1,024:

To clarify, the above formula is based on a decimal version of counting (K = Kilobyte = 1,000 bytes). If you're working in binary (K = kibibyte = 1,024 bytes) then you need to modify the second half slightly:

=IFERROR(VALUE(A1),VALUE(LEFT(A1,LEN(A1)-1)))*1024^IFERROR(SEARCH(RIGHT(A1),"KMGTPEZY"),0)

Handling a "B" at the end:

As for handling KB, MB, and the like, you can modify the references to A1 to remove B. Rather than make the formula even longer, we can use the LET() function to save the SUBSTITUTE() result as a single character for reference later.

=LET(s,SUBSTITUTE(A1,"B",""),IFERROR(VALUE(s),VALUE(LEFT(s,LEN(s)-1)))*1000^IFERROR(SEARCH(RIGHT(s),"KMGTPEZY"),0))

If you don't have the LET() function (I think it's just for Office 365), then you can stick with the really long version OR (what I would recommend) is making a helper column to remove any instance of B.

Long formula version:

=IFERROR(VALUE(SUBSTITUTE(A1,"B","")),VALUE(LEFT(SUBSTITUTE(A1,"B",""),LEN(SUBSTITUTE(A1,"B",""))-1)))*1000^(IFERROR(SEARCH(RIGHT(SUBSTITUTE(A1,"B","")),"KMGTPEZY"),0)

Helper column version:

=IFERROR(VALUE(B1),VALUE(LEFT(B1,LEN(B1)-1)))*1000^(IFERROR(SEARCH(RIGHT(B1),"KMGTPEZY"),0)

Screenshot

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .