Individuals using the Internet (% of population) from 1990 to 2017

Posted on lun. 04 mai 2020 in Economics • 3 min read

Individuals using the Internet (% of population) from 1990 to 2017


The digital and information revolution has dramatically changed the way the world communicates, learns, does business and treats disease. Indeed, the new information and communications technologies (ICTs) offer vast possibilities for advancement in all fields in all countries, from the most to the least developed.

Comparable statistics on access, use, quality and affordability of ICT are essential for formulating policies favorable to the growth of the sector and for monitoring and evaluating the impact of this sector on the development of each country. Although basic access data are available for many countries, in most developing countries little is known about ICT users, including their usage, and how they affect people and businesses. The Global Partnership on Measuring ICT for Development is there to help set standards, harmonize information and communications technology statistics, and build the statistical capacity of developing countries. However, despite significant improvements in developing countries, the gap remains.

Hereafter, we will use Plotly library to spatially visualize the time evolution of the individuals using the Internet through the world.

Import required libraries

In [1]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.io as pio
from IPython.display import Javascript

Javascript(
"""require.config({
 paths: { 
     plotly: 'https://cdn.plot.ly/plotly-latest.min'
 }
});"""
)

pio.renderers.default = 'notebook_connected'

Data pre-processing

In [2]:
df = pd.read_csv('Data/Individuals_using_the_Internet.csv', 
                 header=0,
                 names=['year', 'time_code', 'country_name', 'country_code', 'percentage_internet_users'],
                 usecols=['year', 'country_name', 'country_code', 'percentage_internet_users'],
                 parse_dates=True,
                 dtype={'percentage_internet_users': float}, 
                 na_values='..')

df.head()
<ipython-input-2-8c7ef5338f76>:6: DeprecationWarning:

`np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

Out[2]:
year country_name country_code percentage_internet_users
0 1960 Argentina ARG NaN
1 1960 Australia AUS NaN
2 1960 Brazil BRA NaN
3 1960 China CHN NaN
4 1960 France FRA NaN

Cleaning

Our dataset extends from 1960 to 2018. The Internet has started to be developped in the 1960's, but has really started to be popularized in the 1990's, there has therefor been few or no users between the 1960's and the 1990's.

To prepare our mapping, we begin by dropping all "not a number (NaN)" values. Then, because there is no real interest to map years with all data to 0 - typically years from 1960 to 1990 - we look for and exclude each year where the sum of the percentage of internet users for the whole countries is 0. This process brings us to 1990 as first year with significant values. We also choose to exclude 2018 from the dataset because it still lacks some not negligible values for this year.

In [3]:
df.dropna(inplace=True)

group = df.groupby('year')
df = group.filter(lambda x: x['percentage_internet_users'].sum() > 0)
df = df.drop(df[df['year']=='2018'].index)
df.reset_index(drop=True, inplace=True)
df
Out[3]:
year country_name country_code percentage_internet_users
0 1990 Argentina ARG 0.000000
1 1990 Australia AUS 0.585095
2 1990 Brazil BRA 0.000000
3 1990 China CHN 0.000000
4 1990 France FRA 0.052778
... ... ... ... ...
4935 2017 Virgin Islands (U.S.) VIR 64.377494
4936 2017 West Bank and Gaza PSE 65.200000
4937 2017 Yemen, Rep. YEM 26.718355
4938 2017 Zambia ZMB 27.852579
4939 2017 Zimbabwe ZWE 27.055488

4940 rows × 4 columns

Mapping

In [4]:
fig = px.choropleth(df, 
                    locations='country_code',
                    color='percentage_internet_users',
                    hover_name='country_name',
                    animation_frame='year', 
                    range_color=[0,100],
                    scope='world',
                    labels={'percentage_internet_users':'% of population<br>using Internet'},
                    title="<b>Individuals using the Internet from 1990 to 2017</b><br>" + 
                    "<i>Source : International Telecommunication Union</i>",
                    color_continuous_scale=px.colors.sequential.deep)

# Style
fig.update_layout(
    font_family='Helvetica',
    font_color='grey',
    font_size=12,
    title_font_size=20
)

fig.show()
In [5]:
fig = px.choropleth(df, 
                    locations='country_code',
                    color='percentage_internet_users',
                    hover_name='country_name',
                    scope='world',
                    labels={'percentage_internet_users':'% of population<br>using Internet'},
                    color_continuous_scale=px.colors.sequential.deep,
                    title="<b>Individuals using the Internet in 2017</b><br>" + 
                    "<i>Source : International Telecommunication Union</i>"
                   )

# Style
fig.update_layout(
    font_family='Helvetica',
    font_color='grey',
    font_size=12,
    title_font_size=20,
)

fig.show()

Limitations of the dataset

Operators have traditionally been the main source of telecommunications data, since subscription information is readily available for most countries. This provides a general idea of access, but a more precise measure is the penetration rate, that is, the share of households with access to telecommunications.

Data on actual use of telecommunications services is also important. In recent years, household and business surveys have made more information available on the use of ICTs. Ideally, statistics on telecommunications and other ICTs should be compiled for the three measures: subscriptions, access and use.

Finally, the quality of the data collected varies according to the reporting country due to differences in regulations concerning the supply and availability of data.

Sources

International Telecommunication Union, World Telecommunication/ICT Development Report and database under CC BY-4.0 License.

For additional or latest information on sources and country notes, please also refer to International Telecommunication Union website.