I've been working in analysis / data science and leading data science teams for over a decade by now. Concerning the claim:
I find it a little counterintuitive that python would be so popular amongst data scientists, as I presume that people who are interested in the quality of data would also be interested in the quality of code, and a heavily dynamic language like Python doesn't lend itself to checking correctness.
It is possible to offer a number of reasons, but the key ones are concerned with ecosystem, learning curve and popularity i.e. "snowball factor".
Learning Curve
The challenge is that many data scientist come from statistical/mathematical background with limited knowledge of programming and learn software development aspects of their roles on the job. In comparison to languages like Java or C++, Python doesn't have a steep learning curve and that makes the language particularly appealing from perspective of professional development / training requirements.
Ecosystem
Valid points were made in this discussion concerning the rich ecosystem that Python environment offers. While working on projects ability to leverage existing libraries to solve common data/analytical challenges is particularly appealing. In a commercial analytical setting, the biggest drain on time is the development time not the computational effort. Arguably, rich ecosystem that comes with Python shortens that curve making the language more appealing.
Exogenous factors
Data science managers have to consider external factors when creating analytical teams. Python is popular making recruiting Python developers easy when compared to other languages. That is particularly appealing from business continuity perspective. Building solutions on technology that is exceptionally good but unpopular can prove counterproductive.
Personal perspective
As a data science manager, I've led teams that worked in SAS, R, Python, Scala and variety of other less popular solutions (say Stata, Matlab and so forth). I find it highly disputable whether the one can argue that there is a best language as such. Haskell, with some beautiful concepts like monads, offers amazing possibilities. Haskell should be particularly appealing to companies working with large amount of data; nevertheless, the number of jobs for Haskell developers is significantly smaller when compared to vacancies for Python developers. Julia proposed a number of very sensible paradigms that are promising but it's not widely used in business. It is important to draw distinction between "technical quality" of a language and business utility. Business utility is to a great degree determined by popularity, versatility, learning curve and a wider cost overhead.
Possibly your question could be rephrased: why Python became popular in the first place? The rough answer could be do to appealing combination between simplicity and flexibility.
R vs. Python
In the UK graduates with statistical background are more frequently trained in R whereas graduates trained in computer-related subjects tend to come with knowledge of Python. That divide translates into the industry where consultancies focusing more on statistical side of things tend to use R more often and business concerned more with programming aspect of data science ten to gravitate towards Python. This is very much a soft distinction as solutions like Matlab and SAS have relatively big deployment bases and the business still has plenty of outfits that rely on those technologies.
Education
A postgraduate degree in data science lasts under two years. If you can teach people that in order to run linear regression you have to type lm(y ~ x, data=df)
1, this is much more appealing than common C++ implementation. The outcome of that approach is that data scientist come with a very rudimentary knowledge of programming techniques. The practicalities are that if the intention for a data science programme would be to deliver a traditional computer science training, covering basics of memory management, algorithms and programming plus advanced statistical concepts the programme would have to last ~ 5 years or more. That simply wouldn't work due to financial / market constrains. Universities choose not to focus too strongly on programming details and only use languages as vehicles for running statistical models.
This approach comes with a number of challenges. For instance, with respect to R, many graduates trained in R are unaware of object oriented programming capabilities available through R and struggle with computationally intensive tasks that can be solved by experienced developers having good understanding of common programming concepts.
1 In R.