Hive: writing column headers to local file?

Question

Hive documentation lacking again:

I'd like to write the results of a query to a local file as well as the names of the columns.

Does Hive support this?

Insert overwrite local directory 'tmp/blah.blah' select * from table_name;

Also, separate question: Is StackOverflow the best place to get Hive Help? @Nija, has been very helpful, but I don't to keep bothering them...

John Conde · Accepted Answer · 2011-11-26 20:42:59Z

64

Try

set hive.cli.print.header=true;

edited Nov 26, 2011 at 20:42

John Conde

219k99 gold badges461 silver badges501 bronze badges

answered Nov 26, 2011 at 20:28

iggy

7175 silver badges3 bronze badges

1

Is there a way to permanently have this as the default instead of having to specify this setting upon each hive shell and/or command invocation?
– J.D.
Commented Oct 1, 2012 at 22:10
21

I tried this; it causes the header to output to the console, not to the local file . . .
– maverick
Commented Nov 9, 2012 at 21:42
7

@JD Yes, just put it into .hiverc file in your home directory
– wlk
Commented Sep 16, 2013 at 14:38
It appears to be working in CLI only; however, does not have any effect when running a SQL file or from Oozie
– Pasha
Commented Aug 24, 2015 at 22:03
3

This does not answer the OP question
– David דודו Markovitz
Commented Apr 3, 2017 at 12:48

| Show 2 more comments

Jason Sundram · Accepted Answer · 2013-07-26 07:23:25Z

15

Yes you can. Put the set hive.cli.print.header=true; in a .hiverc file in your main directory or any of the other hive user properties files.

Vague Warning: be careful, since this has crashed queries of mine in the past (but I can't remember the reason).

edited Jul 26, 2013 at 7:23

Jason Sundram

12.4k19 gold badges71 silver badges86 bronze badges

answered Oct 10, 2012 at 18:38

Dan B

3383 silver badges5 bronze badges

6

The property hive.cli.print.header=true won't work for 'Insert overwrite local directory' command. it works if we run 'hive -e 'select ..' > Out.tsv'
– Munesh
Commented Jul 30, 2016 at 0:52

Add a comment |

Hercynium · Accepted Answer · 2012-10-26 15:04:34Z

Indeed, @nija's answer is correct - at least as far as I know. There isn't any way to write the column names when doing an insert overwrite into [local] directory ... (whether you use local or not).

With regards to the crashes described by @user1735861, there is a known bug in hive 0.7.1 (fixed in 0.8.0) that, after doing set hive.cli.print.header=true;, causes a NullPointerException for any HQL command/query that produces no output. For example:

$ hive -S
hive> use default; 
hive> set hive.cli.print.header=true;
hive> use default;
Exception in thread "main" java.lang.NullPointerException
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:222)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:287)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:517)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:197)

Whereas this is fine:

$ hive -S
hive> set hive.cli.print.header=true;
hive> select * from dual;
c
c
hive>

Non-HQL commands are fine though (set,dfs !, etc...)

More info here: https://issues.apache.org/jira/browse/HIVE-2334

Jason Sundram · Accepted Answer · 2013-07-26 07:17:30Z

7

Hive does support writing to the local directory. You syntax looks right for it as well.
Check out the docs on SELECTS and FILTERS for additional information.

I don't think Hive has a way to write the names of the columns to a file for the query you're running . . . I can't say for sure it doesn't, but I do not know of a way.

I think the only place better than SO for Hive questions would be the mailing list.

edited Jul 26, 2013 at 7:17

Jason Sundram

12.4k19 gold badges71 silver badges86 bronze badges

answered Apr 14, 2011 at 2:28

QuinnG

6,4162 gold badges40 silver badges48 bronze badges

Add a comment |

McLeodComputing · Accepted Answer · 2014-08-09 01:40:30Z

I ran into this problem today and was able to get what I needed by doing a UNION ALL between the original query and a new dummy query that creates the header row. I added a sort column on each section and set the header to 0 and the data to a 1 so I could sort by that field and ensure the header row came out on top.

create table new_table as
select 
  field1,
  field2,
  field3
from
(
  select
    0 as sort_col,  --header row gets lowest number
    'field1_name' as field1,
    'field2_name' as field2,
    'field3_name' as field3
  from
    some_small_table  --table needs at least 1 row
  limit 1  --only need 1 header row
  union all
  select
    1 as sort_col,  --original query goes here
    field1,
    field2,
    field3
  from
    main_table
) a
order by 
  sort_col  --make sure header row is first

It's a little bulky, but at least you can get what you need with a single query.

Hope this helps!

Basically a good solution, but 1) you no longer need 'from some_small_table' and limit 2) you have to include sort_col into main select 3) you need semicolon at the end — gyorgyabraham, Commented Feb 26, 2019 at 12:55

Jeremy · Accepted Answer · 2013-03-20 17:19:36Z

3

Not a great solution, but here is what I do:

create table test_dat
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" STORED AS 
INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat" 
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat" 
LOCATION '/tmp/test_dat' as select * from YOUR_TABLE;

hive -e 'set hive.cli.print.header=true;select * from YOUR_TABLE limit 0' > /tmp/test_dat/header.txt

cat header.txt 000* > all.dat

answered Mar 20, 2013 at 17:19

Jeremy

6822 gold badges8 silver badges17 bronze badges

1

this can be very slow
– OneSolitaryNoob
Commented Oct 1, 2014 at 21:33

Add a comment |

tdgs · Accepted Answer · 2018-03-13 15:17:04Z

Here's my take on it. Note, i'm not very well versed in bash, so improvements suggestions welcome :)

#!/usr/bin/env bash

# works like this:
# ./get_data.sh database.table > data.csv

INPUT=$1
TABLE=${INPUT##*.}
DB=${INPUT%.*}

HEADER=`hive -e "
  set hive.cli.print.header=true;
  use $DB;
  INSERT OVERWRITE LOCAL DIRECTORY '$TABLE'
  row format delimited
  fields terminated  by ','
  SELECT * FROM $TABLE;"`

HEADER_WITHOUT_TABLE_NAME=${HEADER//$TABLE./}
echo ${HEADER_WITHOUT_TABLE_NAME//[[:space:]]/,}
cat $TABLE/*

Collectives™ on Stack Overflow

Hive: writing column headers to local file?

7 Answers 7

Not the answer you're looking for? Browse other questions tagged
syntax
hive
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

Not the answer you're looking for? Browse other questions tagged syntaxhive or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
syntax
hive
or ask your own question.