19

I have a data frame with one column representing country names. My goal is to add one more column which gives the continent information. Please check the following use case:

my.df <- data.frame(country = c("Afghanistan","Algeria"))

Is there a package that I can use to append a column of data containing the continent names without having the original data?

0

3 Answers 3

35

You can use the countrycode package for this task.

library(countrycode)
df <- data.frame(country = c("Afghanistan",
                             "Algeria",
                             "USA",
                             "France",
                             "New Zealand",
                             "Fantasyland"))

df$continent <- countrycode(sourcevar = df[, "country"],
                            origin = "country.name",
                            destination = "continent")
#warning
#In countrycode(sourcevar = df[, "country"], origin = "country.name",  :
#  Some values were not matched unambiguously: Fantasyland

Result

df
#      country continent
#1 Afghanistan      Asia
#2     Algeria    Africa
#3         USA  Americas
#4      France    Europe
#5 New Zealand   Oceania
#6 Fantasyland      <NA>
0
8

Expanding on Markus' answer, countrycode draws on codelists 'continent' declaration.

?codelist

Definition of continent:

continent: Continent as defined in the World Bank Development Indicators

The question asked for continents but sometimes continents don't provide enough groups for you to delineate the data. For example, continents groups North and South America into Americas.

What you might want is region:

region: Regions as defined in the World Bank Development Indicators

It is unclear how the World Bank groups regions but the below code shows how this destination is more granular.

library(countrycode)

egnations <- c("Afghanistan","Algeria","USA","France","New Zealand","Fantasyland")

countrycode(sourcevar = egnations, origin = "country.name",destination = "region")

Output:

[1] "Southern Asia"            
[2] "Northern Africa"          
[3] "Northern America"         
[4] "Western Europe"           
[5] "Australia and New Zealand"
[6] NA      
2

You can try

my.df <- data.frame(country = c("Afghanistan","Algeria"),
                    continent= as.factor(c("Asia","Africa")))
merge(my.df, raster::ccodes()[,c("NAME", "CONTINENT")], by.x="country", by.y="NAME", all.x=T)
#       country continent CONTINENT
# 1 Afghanistan      Asia      Asia
# 2     Algeria    Africa    Africa

Some country values might need an adjustment; I dunno since you did not provide all values.

1
  • Is there a way to do it without specifying the values for countries, inc ase we have more than a 100 values?
    – Zombraz
    Commented May 10, 2019 at 21:12

Not the answer you're looking for? Browse other questions tagged or ask your own question.