For those concerned with secure foundations for mathematics, the problem is basically this: In order to even state the axioms of ZF set theory, we need to have developed first-order logic; but in order to recognize conditions under which first-order logical formulas can be regarded as "true" or "valid", we need to give a model-theoretic semantics for first-order logic, and this in turn requires certain notions from set theory. At first glance, this seems troublingly circular.
Obviously, this is an extremely subtle area. But as someone who has occasionally been troubled by these kinds of questions, let me offer a few personal thoughts, for whatever they may be worth.
My own view goes something like this. Let's first distinguish carefully between logic and metalogic.
In logic, which we regard as a pre-mathematical discipline, all we can do is give grammatical rules for constructing well-formed statements and derivation rules for constructing "proofs" (or perhaps, so as not to beg the question, we should call them "persuasive demonstrations"); these should be based purely on the grammatical forms of the various statements involved. There is, at this point, no technical or mathematical theory of how to assign meanings to these formulas; instead, we regard them merely as symbolic abbreviations for the underlying natural language utterances, with whatever informal psychological interpretations these already have attached to them. We "justify" the derivation rules informally by persuading ourselves through illustrative examples and informal semantical arguments that they embody a reasonable facsimile of some naturally occurring reasoning patterns we find in common use. What we don't do, at this level, is attempt to specify any formal semantics. So we can't yet prove or even make sense of the classical results from metalogic such as the soundness and/or completeness of a particular system of derivation rules. For us, "soundness" means only that each derivation rule represents a commonly accepted and intuitively legitimate pattern of linguistic inference; "completeness" means basically nothing. We think of first-order logic as merely a transcription into symbolic notation of certain relatively uncontroversial aspects of our pre-critical reasoning habits. We insist on this symbolic representation for two reasons: (i) regimentation, precisely because it severely limits expressive creativity, is a boon to precision; and (ii) without such a symbolic representation, there would be no hope of eventually formalizing anything.
Now we take our first step into true formalization by setting down criteria (in a natural language) for what should constitute formal expression in the first place. This involves taking as primitives (i) the notion of a token (i.e., an abstract stand-in for a "symbol"), (ii) a binary relation of distinctness between tokens, (iii) the notion of a string, and (iv) a ternary relation of "comprisal" between strings, subject to certain axioms. The axioms governing these primitives would go something like this. First, each token is a string; these strings are not comprised of any other strings. By definition, two strings that are both tokens are distinct if the two tokens are distinct, and any string that is a token is distinct from any string that is not a token. Secondly, for any token t and any string S, there is a unique string U comprised of t and S (we can denote this U by 'tS'; this is merely a notation for the string U, just as 't' is merely a notation for the abstract token involved --- it should not be thought of as being the string itself). By definition, tS and t'S' are distinct strings if either t and t' are distinct tokens or S and S' are distinct strings (this is intuitively a legitimate recursive definition). Thirdly, nothing is a string except by virtue of the first two axioms. We can define the concatenation UV of strings U and V recursively: if U is a token t, define UV as the string tV; otherwise, U must be t for some token t and string S, and we define UV as the string tW, where W is the (already defined) concatenation SV. This operation can be proven associative by induction. All this is part of intuitive, rather than formal, mathematics. There is simply no way to get mathematics off the ground without understanding that some portion of it is just part of the linguistic activity we routinely take part in as communicative animals. This said, it is worth the effort to formulate things so that this part of it becomes as small as possible.
Toward that end, we can insist that any further language used in mathematics be formal (or at least capable of formal representation). This means requiring that some distinguishable symbol be set aside and agreed upon for each distinct token we propose to take for granted, and that everything asserted of the tokens can ultimately be formulated in terms of the notions of distinctness, strings, comprisal, and concatenation discussed above. Notice that there is no requirement to have a finite set of tokens, nor do we have recourse within a formal language to any notion of a set or a number at all.
Coming back to our symbolic logic, we can now prove as a theorem of intuitive (informal) mathematics that, with the symbolic abbreviations (the connectives, quantifiers, etc.) mapped onto distinct tokens, we can formulate the grammatical and derivational rules of logic strictly in terms of strings and the associated machinery of formal language theory. This brings one foot of symbolic logic within the realm of formal mathematics, giving us a language in which to express further developments; but one foot must remain outside the formal world, connecting our symbolic language to the intuitions that gave birth to it. The point is that we have an excellent, though completely extra-mathematical, reason for using this particular formal language and its associated deduction system: namely, we believe it captures as accurately as possible the intuitive notions of correct reasoning that were originally only written in shorthand. This is the best we can hope to do; there must be a leap of faith at some point.
With our technically uninterpreted symbolic/notational view of logic agreed upon (and not everyone comes along even this far, obviously -- the intuitionists, for example), we can proceed to formulate the axioms of ZF set theory in the usual way. This adds sets and membership to the primitive notions already taken for granted (tokens, strings, etc.). The motivating considerations for each new axiom, for instance the desire to avoid Russell's paradox, make sense within the intuitive understanding of first-order logic, and these considerations are at any rate never considered part of the formal development of set theory, on any approach to foundational questions.
Once we have basic set theory, we can go back and complete our picture of formal reasoning by defining model-theoretic semantics for first-order logic (or indeed, higher-order logic) in terms of sets and set-theoretic ideas like inclusion and n-tuples. The purpose of this is twofold: first, to deepen our intuitive understanding of correct reasoning by embedding all of our isolated instincts about valid patterns of deduction within a single rigid structure, as well as to provide an independent check on those instincts; and secondly, to enable the development of the mathematical theory of logic, i.e., metalogic, where beautiful results about reasoning systems in their own right can be formulated and proven.
Once we have a precise understanding of the semantics (and metatheory) of first-order logic, we can of course go back to the development of set theory and double-check that any logical deductions we used in proving theorems there were indeed genuinely valid in the formal sense (and they all turn out to be, of course). This doesn't count as a technical justification of one by means of the other, but only shows that the two together have a comforting sort of coherence. I think of their relation as somehow analogous to the way in which the two concepts of "term" and "formula" in logical grammar cannot be defined in isolation from one another, but must instead be defined by a simultaneous recursion. That's obviously just a vague analogy, but it's one I find helpful and comforting.
I hope this made sense and was somewhat helpful. My apologies for being so wordy. Cheers. - Joseph