0

this is an example of CSV file:

timestamp,datetime,price,volume,is_buy,cycle,realized_pnl,unrealized_pnl,fees_paid,margin_use,volume_traded,average_price,position_price,order_long_price,order_short_price
1596112043043,2020-07-30 12:27:23.043,319.5,5,false,0,0,0.0,0,0,0,319.5,0,271.58,-367.42
1596112045045,2020-07-30 12:27:25.045,319.5,0.269,false,0,0,0.0,0,0,0,319.5,0,271.58,-367.42
1596112045045,2020-07-30 12:27:25.045,319.49,3.645,false,0,0,0.00,0,0,0,319.50,0,271.58,-367.42

the timestamp is a Unix timestamp and is what is really used by the code. The datetime column is a 'human readable' date string.

But when Excel loads that CSV, it looks like that:

enter image description here

I tried to use '/' and '-' as date separators but same result.

Interestingly, once I edit a cell but without doing any change, Excel then displays it properly.

So, my question is: what string format can I use for a date and time so that Excel displays it properly when it loads the CSV?

13
  • If you look at the edit line, Excel has correctly parsed the yyyy-mm-dd part too; perhaps you just need to tell it (on import, where you define the column types) that it's a date-time field Commented Oct 19, 2020 at 22:29
  • since we're dealing with tons of files, I'm looking for a format where I don't have to tell excel anything when loading the file and it would just display it as a string; I don't need it to parse the date, I just need to be able to read it as it is in the CSV
    – Thomas
    Commented Oct 19, 2020 at 22:31
  • If you want to read it literally as it is in the CSV, either Excel is the wrong tool or you need to tell it to treat the column as type "Text" (no interpretation) Commented Oct 19, 2020 at 22:34
  • if I write a field in plain English, Excel will display it literally; the issue here is that it tries, and fails, to parse that field as a date. There has to be a syntax where either it parses it and doesn't fail, or one where it doesn't see it as a date and displays it literally.
    – Thomas
    Commented Oct 19, 2020 at 22:37
  • Only when you tell it so. That's the problem with CSV - there is no way to tell Excel the data type except at the point of import; otherwise it has to infer it from the shape of the data Commented Oct 19, 2020 at 22:49

1 Answer 1

0

First, the formatting thing. Excel looks at the first 8 records (yes, just 8, even for a 500,000 row import) to decide the formatting to apply. The trivial half of that is that you are almost certainly getting 12:27:23 displayed as 27:23.0 because the first records in the CSV have times pretty close to midnight, so, say 00:01:25.043. Excel sees eight records like that and figures you need to display minutes, seconds, and the single decimal. No matter you have 400,000 records with hours too vs. 28 or 435 or whatever with "00" for the hours.

This points to the bigger issue: why is the date not included in that formatting? If you look at those first records, you might find the dates are not present, or in some way, the value is 0. So Excel sees date and formats only for time, and cripples that too.

You could do a lot of things. Power Query is your salvation here and I will expand a small amount on that shortly. But in the tradition of shows on Atlantis-finding, Let's spend 58 minutes pooh-pooh-ing the other solutions, then a second or two on PQ.

First, VBA. No pooh-pooh-ing this actually. Just open the file, and run a macro that adjusts the formatting for columns you need formatted. Write Once, Use Many Times. WOUMT. Nope, won't catch on. One bit of effort, use forever. Put it in some drone file you open before opening the tons of files and have it operate on the opened CSV.

Second, create a template file with the formatting you need. Open the CSV and immediately copy and paste the data into the template file. All done. Save it with the right name and move on to the next one.

Third, format by hand. I bet there aren't a lot of columns here, so it's not unbearable. Formatting by hand could be done via a template file that you copy the first row from to the clipboard, then Paste | Special | Formatting (or Alt-E, S, T) to all the rows of the newly opened CSV. So, not actually hard or time consuming.

Well, that's enough. Notice a thread there as you object to each. You might think "don't like macros" or "don't know VBA" or "I just ain't doin' nothin' by hand, dude" or "open a template, copy and paste or paste once to get the formatting, ugh, that's so much work" but I wager you are coming back to the "tons of files" thing.

Tons of files, tons of work, only so many years in your life. This is what VBA was created for. VBA truly is a wonder for what you describe. Geez, even, horrors, paying someone a few hundred dollars to write the macro you need for opening file after file after file and formatting, then saving with appropriate names, placing in your directory structure where needed, even emailing them or what-have-you... all this for a few hundred dollars. An organization with tons of files to process has the cash for something like that and bosses that respect someone who sees the best solution isn't him and HANDLES the situation cost-effectively and time-effectively.

But truthfully, PQ can do VBA one better in many ways. It can be set up for the import, and massaging of the data formatting and such, without too much trouble. Then it keeps all of that in itself (the good old "Data Model") and can do it to a million files without complaint. It can handle as many rows as your memory can, not just a million like Excel. And its results can be accessed via Power Pivot (which comes with Excel) and Excel and, well, a lot of other programs. One weakness here is the filename it is looking at, but its SQL (it does all this via SQL, "under the hood" if you like) can be edited by anyone who knows SQL. Need an outsider? Back to that few hundred dollars thing, but an SQL gal might charge somewhat more than the VBA guy. Anyway, It ought to be able to be modified to handle a directory full of files if you like.

And it has one monstrous advantage over VBA, if needed: it can combine those files naturally and with ease. You thought you were going to use Excel itself to analyze data but had 15 million rows? Had to have it broken into a ton of files to import into spreadsheets, then to cobble together formulas to read it all and bring it into focus? Didn't ask for that, but the provider, who wrote their export routines 25 years ago simply provided it that way because any other way takes work while this is just a mouseclick? PQ can put it all back together, seamlessly, like it was never broken up. It was similar material from 25 different sources and your project actually is to put it together? Same thing to PQ, same ease of doing.

So... if YOUR task is only to import and save it as Excel files with appropriate formatting for use by someone else, VBA. If you need to use it yourself, or there is a next step of putting it together for a bigger analysis, wow the boss who thought he had to budget for two projects by using PQ to do it all in one fell swoop.

A teensy detail that I skipped earlier since you made it clear you do not want to put a lot of effort into each import: if you open each file in an editor and move a dozen or two rows with more "fully detailed" data to the top rows of the file, the more fully detailed rows will be what Excel looks to when gauging how to format the columns. If it had seen the first row in your sample material first, you'd've had a full date and time display like: 2020-07-30 12:27:23.0 and none of this would have ever vexed you.

But... that's maybe more work than formatting things so I didn't figure you would be interested.

It is also possible to force Excel to look at as many rows as you'd care to have looked at instead of 8. However, bear in mind three things:

  1. You would have to research that because I don't remember how. Been a long time. I mentioned it in a different question and the "usual (knowledgeable) suspects" were there but no one filled the poster in on it so it seems it's not casually found.
  2. Your data might routinely have a LOT of unhelpful rows coming first. Thank the fellow who exported it sorted like that. But if you are importing half a million rows the first 10-20,000 might be betwwen midnight and 1:00 am and to reach rows with full detail might go a long way past that. That could mean a LONG lag in the import, could even mean Excel choking and failing.
  3. Bearing in mind that lag, it might be easier to open the files in an editor and either move fully representative rows to the start, or actually simply copy a set of, say, 10-20 rows of fully representative data to a file, then copy those rows to EVERY file's beginning, and delete those rows immediately upon import. The quickness of the import process vs. bogging down considering 10-20,000 rows or more befor even starting to import should easily outweigh the work just described.
1
  • thanks for the very detailed answer! the context is that we're outputting and throwing many of these files every day. When there is an issue somewhere, it allows anyone in the team to have a quick look at what happened. Almost all files get deleted without ever be opened. But the key is that they just need to be able to find a precise datetime quickly in the file. I will look at PowerQueries, I know nothing about it.
    – Thomas
    Commented Oct 21, 2020 at 14:16

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .