tl;dr
You're aiming for Gb/s speeds, at rates where you can't even spawn processes fast enough, let alone start shells. A shell script hence won't be your solution. Use libcurl
with a programming language that allows access to libcurl-multi
.
Full answer
Which is the best approach?
- Use gnu parallel on text file, where each text file will have 10000 prepared http urls? (Here in the text file curl is better than http?)
- While doing the above is each curl request run in its own shell? How do I change this to a single shell that can send req and receive response for 10000 requests/sec?
You want to do 10,000 requests a second "in the order of KBs" (plural), so that's more than a single 1Gb/s ethernet cable. Be sure you understand the architectural implications of that!
First of all, your transmission medium (some ethernet?) is serial. sending multiple packets in parallel is not possible, on the deepest technical level. What is possible is to fill the queue of packets to be sent from multiple cores, and then handle incoming replies in a parallel fashion.
Then: spawning a whole curl
process just to do a single request is super inefficient. But spawning a whole shell (I guess you were thinking about bash?) is even worse! (spawning a process means creating / forking a new process, replacing that with the image from the executable, loading all the libraries the executable depends on, parsing the configuration / startup scripts, then doing the real work. And that real work in your case is easier than the whole rest.) (My quick test loop in C says you can't do more than ca 3500× vfork()
followed by checking the PID and then exec
ing an empty program per second on an 8-thread 3.6 GHz CPU. And you want to exec something very heavyweight. You want to spawn 3 times the processes that my machine can spawn per second, and still do network and processing in these spawned processes. Not gonna happen, by a large margin.)
You spawing a jq
process for every request is outright disastrous. jq
has to do two things:
- Parse the query you've passed it as argument,
- Parse the JSON according to that query.
Now, the JSON is different every time, but relatively straightforward to parse once you know the query (and said query is not too complex), but parsing the query is per se basically compiling a program. You're doing the same compilation a lot of times, where in reality, the query just stays the same.
So, for high-performance testing as this, doing this via shell and parallel
/xargs
does not work - your systematic inefficiencies will simply not allow you to work at the speeds you need.
So, instead, write a program. The programming language doesn't matter too much, as long as it allows for proper multithreading (maybe avoid PHP, Delphi and um, Visual Basic 6.0), and has access to a reasonably fast JSON parser. Python would work, but Python is not known for good multithreading. But it might still work. Personally, I'd simply write this in C++.
Recommendation:
If you feel like you know your JSON parser well enough to know that you'd rather avoid having to deal with epoll
details: libcurl
has a nice API for jobs like this: the libcurl-multi multi interface. Do one of these things for every CPU core, and you'll probably be saturating your connection to the server.
If you want to write an application that globally maximizes CPU utilization at the cost of complexity, you'd have separate transmit and receive workers, seeing that sending the request is probably a lot easier than handling the received data. In that, you'd first
- initialize your multi-thread-safe logging system (I like spdlog, but seeing 10000 potential logs per second, something that aggregates and writes binary data instead of human-readable text files might be much, much more approriate),
- set up your parser,
- spawn a bunch (wild guess: as many as you have CPU threads /3 - 1) of transmit workers (TX),
- spawn a bunch (wild guess: CPU threads·2/3 - 1) of receive workers (RX),
- spawn a thread that holds an operating system notification token for TCP sockets ready to read data. On linux, that mechanism would be
epoll
, which is available in Python through the select
module as select.epoll
.
- spawn a thread that prepares the requests, establishes the TCP connection, and then assigns them to
workers' incoming queues
In each TX worker,
- you make the prepared request (which might just be a
curl_
function call - libcurl is actually a nice library, not just a command line tool)
In each RX worker,
- You take the data you've just gotten and parse it, calculate your result and if it fits, tell your logging system to log,
In the epoll thread,
- Handle the events, by getting the data and handing it of (e.g. in a round-robin way) fairly to the RX workers.
If you want an example of using epoll
with libcurl, curl
has an example (and it's very close to your use case, actually!).
If you want a discussion on how to deal with multi-threads, curl and network in C++, this Stack Overflow answer might be for you.