0

A mobile back-end I am building should receive a larger amount of data (some sensor recordings) from an Android phone. The phone does not have much use of it so it was most efficient to store the data into plain files (SQLite can get quite slow with larger amounts). Now when we upload the data to a REST service backed with a distributed database, we have another issue. Storing to it takes some time, since it is really a lot. I thought of uploading the file to the web service as fast as possible, leave the phone alone, cache the file somewhere on the server and then some long-running workers will pick the data up and chunk it into databases. We have some other mechanisms to verify the data is properly stored and so on, but that is not important.

I would like to know is the REST->file->database approach valid? My concern is where to store the file? Disk, some in-memory database, a cache? Web servers fail (we can request a re-upload, but I would rather mitigate the risk earlier), or the server can get cluttered. I'm afraid the local storage on the web servers does not scale really well, if we have multiple workers and servers.

Thank you in advance.

1
  • Sounds like you want a queue to receive the results of the REST service. There are quite a few message queues out there that would take care of that for you.
    – user53019
    Commented Jan 15, 2015 at 16:20

2 Answers 2

1

It's perfectly valid, and is usually done in situations like the one you have. Others have suggested message queues, which are nice but they'll either take up memory or wind up writing the data to a backing store anyway, so now you've got another layer of software doing what you could do directly. That said, it may be easier and cleaner to use a message queue's API instead of rolling your own solution; it all depends on your environment and performance requirements. I would imagine the message queue would provide a cleaner separation of concerns, but you'll have to determine for yourself if there's going to be a performance hit.

1
  • 1
    Thank you. I've worked with brokers for a decent amount of time. Personally I like Kafka, so if I choose a broker, I would probably choose it to bridge the service <-> storage gap. It didn't occur to me that I could actually use it, but it is nice that I asked this and Adam and you brought it up. I'll think about it, maybe do some benchmarks and calculations. I actually heard about companies using Kafka as a storage, although temporary (for 15 days). Commented Jan 15, 2015 at 19:57
2

You may want to consider a Message Queuing service like Rabbit MQ, this would allow you to send data to a server that would simply keep hold of it until you had a consumer/worker available to process it. You would have to consider things like memory usage on the message queue server if your datasets are very large (or find a way to send the data in smaller chunks), but it does support things like message persistence (saving the data to disk before it is processed, meaning data isn't lost if the server dies) and clustering if your usage requires it.

If that's not possible though, your basic idea of REST Upload > Store on Disk > Process in to DB is perfectly valid, if you're worried about data loss you can apply standard redundancy processes to your storage (replication etc.) and you should ensure your Process in to DB stage deletes the uploaded files once they're not needed any more.

0

Not the answer you're looking for? Browse other questions tagged or ask your own question.