63

I'm new in cassandra, and I have to export the result of a specific query to a csv file.

I found the COPY command, but (from what I understand) it allows you only to copy an already existing table to a csv file, and what I want is to copy directly the stdout of my query to the csv file. is there any way to do it with COPY command or with another way ?

My command is style (select column1, column2 from table where condition = xy) and I'm using cqlsh.

16 Answers 16

75

If you don't mind your data using a pipe ('|') as a delimiter, you can try using the -e flag on cqlsh. The -e flag allows you to send a query to Cassandra from the command prompt, where you could redirect or even perform a grep/awk/whatever on your output.

$ bin/cqlsh -e'SELECT video_id,title FROM stackoverflow.videos' > output.txt
$ cat output.txt

 video_id                             | title
--------------------------------------+---------------------------
 2977b806-df76-4dd7-a57e-11d361e72ce1 |                 Star Wars
 ab696e1f-78c0-45e6-893f-430e88db7f46 | The Witches of Whitewater
 15e6bc0d-6195-4d8b-ad25-771966c780c8 |              Pulp Fiction

(3 rows)

Older versions of cqlsh don't have the -e flag. For older versions of cqlsh, you can put your command into a file, and use the -f flag.

$ echo "SELECT video_id,title FROM stackoverflow.videos;" > select.cql
$ bin/cqlsh -f select.cql > output.txt

From here, doing a cat on output.txt should yield the same rows as above.

8
  • 1
    Yeh it should resolve my problem, and that what i want to do. exept that it doesn't recognize the -e option. Usage: cqlsh [options] [host [port]] ----- i missed anything ? cqlsh: error: no such option: -e
    – Dady09
    Commented Nov 13, 2014 at 15:21
  • 3
    @user3196353 You must be running on an older version. Edit made...try -f with a command file instead.
    – Aaron
    Commented Nov 13, 2014 at 15:45
  • 1
    @arseny In that case, you can also use cqlsh -e with double-quotes: cqlsh -e"SELECT * from dev.photos where uid='score_test4';" > output.txt
    – Aaron
    Commented Mar 14, 2016 at 15:58
  • 10
    This gives 100 results into the CSV, if we have more results than that which are retrieved by clicking 'MORE' in the interactive mode, how do we go about getting them the CSV? Commented Mar 29, 2018 at 7:27
  • 12
    @JagratiGogia prepend your query with PAGING OFF;. E.g. cqlsh -e "PAGING OFF; SELECT * FROM stackoverflow.videos;"
    – Zoltán
    Commented Dec 19, 2018 at 9:25
52
  1. Use CAPTURE command to export the query result to a file.
cqlsh> CAPTURE
cqlsh> CAPTURE '/home/Desktop/user.csv';
cqlsh> select *from user;
Now capturing query output to '/home/Desktop/user.csv'.

Now, view the output of the query in /home/Desktop/user.csv

  1. Use DevCenter and execute a query. Right click on the output and select "Copy All as CSV" to paste the output in CSV.

enter image description here

4
  • Option 2 is simple, except that DevCenter limits the output rows to 1000, unless you are using 1.6: stackoverflow.com/questions/37862375/… Commented Mar 27, 2017 at 12:47
  • 2
    This (option 1) seems to be a much better option than the accepted answer. Commented Jun 8, 2018 at 15:39
  • 2
    Actually, I think I misunderstand - option 1 doesn't output in CSV format, it just redirects the existing output to a file whose name ends in .csv, right? Commented Jun 8, 2018 at 16:24
  • 1
    yes. you can use "COPY TO" to export data from a table into a CSV file.
    – Arun
    Commented Jun 8, 2018 at 23:48
15

In 2020th you can use DSBulk to export or import data to/from CSV (by default), or JSON. It could be as simple as:

dsbulk unload -k keyspace -t table -u user -p password -url filename

DSBulk is heavily optimized for fast data export, without putting too much load onto the coordinator node that happens when you just run select * from table.

You can control what columns to export, and even provide your own query, etc. See following blog posts for examples:

11

I just wrote a tool to export CQL query to CSV and JSON format. Give it a try :)

https://github.com/tenmax/cqlkit

2
  • 1
    I would have given it a try...if a had java8 :(
    – Dady09
    Commented Aug 12, 2015 at 15:27
  • 1
    It looks like this is the only answer that lets you stream output in CSV format. +1. Commented Jun 8, 2018 at 16:24
7

Cannot comment... To deal with "MORE" issue when there are more than 100 rows, simply add "paging off" before the SQL.

Something like

$ bin/cqlsh -e'PAGING OFF;SELECT video_id,title FROM stackoverflow.videos' > output.txt

This will cause a little messy at the beginning of the output file but can easily be removed afterwards.

7

You can use the COPY command to create the CSV file. e.g. copy table with selected columns. Columns are optional, if you select them, every column will be picked.

COPY TABLE (COL1, COL2) TO 'filename.csv' HEADER=TRUE/FALSE

For more reference https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlshCopy.html

1
  • 8
    But how do you add where-conditions?
    – Turion
    Commented Feb 27, 2019 at 13:29
7

With bash:

If you need to query the data (not possible with COPY TO) and if you need the final product to be importable (ie with COPY FROM):

cqlsh -e "SELECT * FROM bar WHERE column = 'baz' > raw_output.txt

Then you can reformat the output with sed

sed 's/\ //g; /^----.*/d; /^(/d; /^\s*$/d;' raw_output.txt | tee clean_output.csv

Which pretty much says

sed 'remove spaces; remove the column boarder; remove lines beginning with (COUNT X); and remove blank lines' | write output into clean_output.csv

The sed regexp's could be cleaned up to better suite your specific case, but thats the general idea.

4
  • 1
    That’s most probably will timeout when you have a lot of data
    – Alex Ott
    Commented Nov 4, 2020 at 21:30
  • @AlexOtt I was surprised, but that's exactly what happened when I tried this. Good catch.
    – dixon1e
    Commented Jan 5, 2021 at 1:50
  • 1
    @dixon1e that's why it's important to use something like DSBulk for big amounts of data...
    – Alex Ott
    Commented Jan 5, 2021 at 7:15
  • Nice, was unaware of DSBulk. Many thanks.
    – dixon1e
    Commented Apr 2, 2021 at 5:14
5

I believe DevCenter also allows you to copy to CSV. http://www.datastax.com/what-we-offer/products-services/devcenter

4

In windows, double quotes should be used to enclose the CQL.

cqlsh -e"SELECT video_id,title FROM stackoverflow.videos" > output.txt

3

If I am understanding correctly you want to redirect your output to stdout?

Put your cql command in a file. My files is called select.cql and contents are:

select id from wiki.solr limit 100;

Then issue the following and you get it to stdout:

cqlsh < select.cql

I hope this helps. From there on you can pipe it and add commas, remove headers etc.

3
  • 1
    Thank you for the response, but it's not what i really need, i'm searching for redirecting the output of my sql query to a csv file (stdout > file.csv) so i can use it in other jobs
    – Dady09
    Commented Nov 13, 2014 at 14:31
  • 1
    Yep, that's what that does. The result of the query stored in the cql file is outputted to stdout. Sometimes, for complex or long queries, you store them in a file rather than write them in the cmd line. Then you can 'send' the file with the query to cql and you get the output to stdout. For one-liners and simple queries there is the other method as well.
    – Bereng
    Commented Nov 14, 2014 at 9:31
  • 1
    in hindsight, you're right, i must have misunderstood you the first time, the other answer seemed a little more clear to me, sorry
    – Dady09
    Commented Nov 18, 2014 at 16:08
2

As the other guys have suggested, export the standard query output using ./cqlsh -e 'SELECT ...' > data.csv.

Once you have this you can easiliy replace the pipes ( | ) with commas using Excel (if you have it installed).

  1. First open your file in a text editor (vi/notepad++) and delete the separator that Cass puts in (-----+-------+---), as well as the info on the number of rows from the bottom.
  2. Open a new Excel workbook.
  3. Click on the Data tab.
  4. Click on "From Text/CSV" (top left).
  5. Select your file, specifiy the pipe symbol as a delimiter, click Load.
  6. That will create an .xlsx file so you'll have to Save As .csv manually. That will strip Excel's formatting and leave you with commas.
1
  • 1
    please don't do that... it's simply not scalable
    – Alex Ott
    Commented Jan 4, 2021 at 17:38
1

Here, when using Cassandra on a docker container with SSL and authentication:

docker exec -it <cassandra_container> cqlsh -u <user> -p <password> --ssl -e  "select column1, column2 from table where condition = xy" > output.txt
0

CQL COPY is good option for importing or exporting data. But if you want to analyze some small query output you can run below command and save the output in a file.

cqlsh -e "SELECT * FROM table WHERE column = 'xyz' > queryoutput.txt

However, you can use CAPTURE also for saving output of the query to analyze something

0

Follow the below steps to selectively export & import the Cassandra data.

Exporting:

  • Write all the select queries in a file named dump.cql like below

    paging off;

    select * from student where id=10;

    select * from student where id=15;

Note: Paging off is mandatory above the queries to avoid limiting the query results to default 100 records

  • Creating a dump

cqlsh -u user_name -p 'password' ip_address -k keyspace_name -f dump.cql > dump.csv;

(for remote machine)

or

cqlsh -k keyspace_name -f dump.cql > dump.csv;

(for local machine)

  • Removing whitespace characters from dump(It avoids removing whitespace withing json data)

sed -r 's/(\".*\")|\s*/\1/g' dump.csv > data_without_spaces.csv

Importing:

cqlsh -e "copy keyspace_name.table_name from 'data_without_spaces.csv' with delimiter = '|';"

3
  • it's bad idea to use cqlsh for that - there are specialized tools, like DSBulk ot do that
    – Alex Ott
    Commented Sep 25, 2020 at 9:13
  • The question has asked about taking dump of a specific query, I don't think DSBulk will support that. Commented Sep 28, 2020 at 13:27
  • 1
    it perfectly supports that... via -query option, although for that query it's not required - it's just need to export 2 specific columns. And given query is really scanning of the whole table that isn't very well supported in the cqlsh if you have a lot of data
    – Alex Ott
    Commented Sep 28, 2020 at 14:17
0
echo "SELECT x, y, z FROM key_space.tableName WHERE date='DATE';" | cqlsh -u userName -p password 172.x.y.z > out.csv

above command can be run to get the result data in csv file

-1

The person asking asked for CSV not text.

I did this hack get my results. It worked for me and I moved on with my day.

me:~/MOOSE2# echo "USE ████it; select * from samples_daily_buffer where dog_id=██48;" | cqlsh --cqlversion="3.4.4" cassandra0.stage.███████ | sed -e "s/ | */,/g" | sed -e "s/^ *//g" | tail -n +4 > ./myfile.csv

Not the answer you're looking for? Browse other questions tagged or ask your own question.