Whats the best way to export data from BigQuery
to Google Storage. Note, I need to run a query against Bigquery
and not export all data. Essentially, I need to run a custom query against BigQuery
( like select * from mytable where code=foo
) and the results of the query need to be written into a csv , stored on Google Cloud.
I Believe, the best way to do this is via Google Dataflow. Let me know if there are other options?
Also, I am looking for some samples on how to accomplish this. Is there somewhere I can find some examples?
This is what I have so far PipelineOptions pipelineOptions = PipelineOptionsFactory.create(); Pipeline p = Pipeline.create(pipelineOptions);
Date date = new Date();
p.getOptions().setTempLocation("gs://mybucket/tmp"+date.getTime());
PCollection<TableRow> rowPCollection = p.apply(BigQueryIO.Read.named("promos")
.fromQuery("SELECT * FROM [projectid:mydataset.mytable] where id = 256 LIMIT 1000"));
PCollection<String> stringPCollection = rowPCollection.apply(ParDo.named("Extract").of(new DoFn<TableRow, String>() {
@Override
public void processElement(ProcessContext c) {
TableRow tableRow = c.element();
try {
String prettyString = tableRow.toPrettyString();
c.output(prettyString);
} catch (IOException e) {
log.error("Exception occurred:" + e.getMessage());
}
}
}));
stringPCollection.apply(TextIO.Write.named("WriteOutput").to("gs://mybucket/avexport").withSuffix(".csv"));
p.run();
When this run, a exception is thrown at creation of ParDo
caused by: java.io.NotSerializableException: com.my.validation.CommonValidator
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at com.google.cloud.dataflow.sdk.util.SerializableUtils.serializeToByteArray(SerializableUtils.java:50)