Error in plotting of frequency histogram from csv data

Question

I am working with a csv file with pandas module on python3. Csv file consists of 5 columns: job, company's name, description of the job, amount of reviews, location of the job; and i want to plot a frequency histogram , where i pick only the jobs containing the words "mechanical engineer" and find the frequencies of the 5 most frequent locations for the "mechanical engineer" job.

So,i defined a variable engloc which stores all the "mechanical engineer" jobs.

engloc=df[df.position.str.contains('mechanical engineer|mechanical engineering', flags=re.IGNORECASE, regex=True)].location

and did a histogram plot with matplotlib with code i found online

 x = np.random.normal(size = 1000)
 plt.hist(engloc, bins=50)
 plt.gca().set(title='Frequency Histogram ', ylabel='Frequency');

but it printed like this

How can i plot a proper frequency histogram where it plots using only 5 of the most frequent locations for jobs containing "mechanical engineer" words, instead of putting all of the locations in the graph?

This is a sample from the csv file

If you share your data somebody may be willing to help you. See minimal reproducible example for a complete explanation. — Sergey Bushmanov, Commented Feb 5, 2020 at 18:52
Should i include a screenshot of some part or write in the question a few parameters of the data? — Ayano, Commented Feb 5, 2020 at 18:59
A "representative" sample of data, allowing to plot a meaningful histogram, entered as text should do. Link to the full data even better. — Sergey Bushmanov, Commented Feb 5, 2020 at 19:01

Sergey Bushmanov · Accepted Answer · 2020-02-06 06:26:27Z

1

Something along the following lines should help you with numerical data:

import numpy as np
counts_, bins_ = np.histogram(englog.values)
filtered = [(c,b) for (c,b) in zip(counts_,bins_) if counts_>=5]
counts, bins = list(zip(*filtered))
plt.hist(bins[:-1], bins, weights=counts)

For a string type try:

from collections import Counter 
coords, counts = list(zip(*Counter(englog.values).most_common(5)))
plt.bar(coords, counts)

edited Feb 6, 2020 at 6:26

answered Feb 5, 2020 at 19:56

Sergey Bushmanov

24.7k8 gold badges61 silver badges80 bronze badges

It gave an error like this "TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''.
– Ayano
Commented Feb 5, 2020 at 20:02
Sadly the second one gave an error too.TypeError: list() takes at most 1 argument (5 given)
– Ayano
Commented Feb 5, 2020 at 21:12
@Ayano Does this solve your problem? If so you may think about accepting the answer
– Sergey Bushmanov
Commented Feb 7, 2020 at 9:19

Add a comment |

Collectives™ on Stack Overflow

Error in plotting of frequency histogram from csv data

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
python
python-3.x
pandas
csv
matplotlib
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged pythonpython-3.xpandascsvmatplotlib or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
python
python-3.x
pandas
csv
matplotlib
or ask your own question.