American Lawful Immigration 2018 : New LPRs by State of Residence
Posted on mar. 13 octobre 2020 in Politics • 7 min read
American Lawful Immigration 2018 - New LPRs by State of Residence¶
Legal Permanent Residents (LPRs) are non-citizens who are lawfully authorized to live permanently within the United States. Let's explore data from the U.S. Department of Homeland Security to highlight the states in which people who received a green card in 2018 settled.
Import required libraries¶
import pandas as pd
import plotly.express as px
import plotly.io as pio
from IPython.display import Javascript
Javascript(
"""require.config({
paths: {
plotly: 'https://cdn.plot.ly/plotly-latest.min'
}
});"""
)
pio.renderers.default = 'notebook_connected'
Data pre-processing¶
To map the figures of american new Legal Permanent Residents by state of residence in 2018, we need to merge three datasets :
- The Persons Obtaining Legal Permanent Resident Status by State or Territory of Residence dataset, which is our main dataset containing among other things for each state of residence the number of new LPRs in 2018;
- The US State Abbreviations dataset, which allows us to merge, for each state of residence, the associated state abreviation;
- The US Population 2018 dataset, which gives us the numbers of inhabitants of each US state in 2018.
LPRs 2018 by state of residence dataset¶
df = pd.read_csv('Data/Persons_Obtaining_Lawful_Permanent_Resident_Status_by_State_or_Territory_of_Residence.csv',
sep=';',
skiprows=4)
# Cleaning
df.drop(df.iloc[:, 1:10], inplace = True, axis = 1)
df.columns = ['State or territory of residence', 'LPRs 2018']
df.dropna(inplace=True)
df['LPRs 2018'] = df['LPRs 2018'].str.replace(" ", "")
df['LPRs 2018'] = df['LPRs 2018'].astype('int')
df.head()
US State Abbreviations dataset¶
df_state = pd.read_csv('Data/US_State_Abbreviations.csv',
sep=';')
df_state.rename(columns = {'State':'State or territory of residence'}, inplace = True)
df_state.head()
US Population 2018 dataset¶
df_pop = pd.read_csv('Data/US_Population_2018.csv',
sep=';',
skiprows=8)
# Cleaning
df_pop.drop(df_pop.iloc[:, 1:11], inplace = True, axis = 1)
df_pop = df_pop.iloc[:, :-1]
df_pop.columns = ['State or territory of residence', 'US Population 2018']
df_pop.dropna(inplace=True)
df_pop['US Population 2018'] = df_pop['US Population 2018'].str.replace(" ", "")
df_pop['US Population 2018'] = df_pop['US Population 2018'].astype('int')
df_pop.head()
Merging¶
Now our three datasets are clean, they can be merged. To this, we add two columns that will make the data more meaningful :
- The percentage of new LPRs 2018;
- The new LPRs 2018 per 1,000 population.
#Merge LPRs 2018 by state of residence and US State Abbreviations datasets
df_merge = pd.merge(df, df_state, on='State or territory of residence')
#Compute and add Percentage of LPRs 2018 column
df_merge['Percentage of LPRs 2018'] = df_merge['LPRs 2018']/df_merge['LPRs 2018'].sum()*100
#Merge US Population 2018 dataset to the previous merge, giving us our final dataset
df_LPRs_2018 = pd.merge(df_merge, df_pop, on='State or territory of residence')
#Compute and add LPRs 2018 per 1,000 population column
df_LPRs_2018['LPRs 2018 per 1,000 population'] = df_LPRs_2018['LPRs 2018']/df_LPRs_2018['US Population 2018']*1000
df_LPRs_2018 = df_LPRs_2018.sort_values('LPRs 2018', ascending=False)
df_LPRs_2018
Mapping¶
New LPRs by state of residence in 2018¶
First, we want to represent the number of new LPRs by state of residence in 2018 :
fig = px.choropleth(df_LPRs_2018,
locations='Code',
color='LPRs 2018',
hover_name='State or territory of residence',
locationmode="USA-states",
scope='usa',
labels={'LPRs 2018':'New LPRs'},
color_continuous_scale=px.colors.sequential.dense,
title="<b>U.S. New LPRs by State of Residence in 2018</b><br>" +
"<i>Source : U.S. Department of Homeland Security</i>"
)
# Style
fig.update_layout(
font_family='Helvetica',
font_color='grey',
font_size=12,
title_font_size=18
)
fig.show()
fig = px.treemap(df_LPRs_2018,
title="<b>U.S. New LPRs by State of Residence in 2018</b><br>" +
"<i>Source : U.S. Department of Homeland Security</i>",
labels={'LPRs 2018':'New LPRs', 'State or territory of residence':''},
path=['State or territory of residence'],
values='LPRs 2018',
color = 'LPRs 2018',
color_continuous_scale=px.colors.sequential.dense
)
# Style
fig.update_layout(
font_family='Helvetica',
font_color='grey',
font_size=12,
title_font_size=18
)
fig.show()
New LPRs by state of residence per 1,000 population in 2018¶
Now, we want to represent the number of new LPRs by state of residence per 1,000 population in 2018 :
df_LPRs_1000_population = df_LPRs_2018.sort_values('LPRs 2018 per 1,000 population', ascending=False)
df_LPRs_1000_population
fig = px.choropleth(df_LPRs_1000_population,
locations='Code',
color='LPRs 2018 per 1,000 population',
hover_name='State or territory of residence',
locationmode="USA-states",
scope='usa',
labels={'LPRs 2018 per 1,000 population':'New LPRs per<br>1,000 inhabitants'},
color_continuous_scale=px.colors.sequential.matter,
title="<b>U.S. New LPRs by State of Residence per 1,000 inhabitants in 2018</b><br>" +
"<i>Source : U.S. Department of Homeland Security</i>"
)
# Style
fig.update_layout(
font_family='Helvetica',
font_color='grey',
font_size=12,
title_font_size=18
)
fig.show()
fig = px.treemap(df_LPRs_1000_population,
title="<b>U.S. New LPRs by State of Residence per 1,000 inhabitants in 2018</b><br>" +
"<i>Source : U.S. Department of Homeland Security</i>",
labels={'LPRs 2018 per 1,000 population':'New LPRs per<br>1,000 inhabitants', 'State or territory of residence':''},
path=['State or territory of residence'],
values='LPRs 2018 per 1,000 population',
color = 'LPRs 2018 per 1,000 population',
color_continuous_scale=px.colors.sequential.matter
)
# Style
fig.update_layout(
font_family='Helvetica',
font_color='grey',
font_size=12,
title_font_size=18
)
fig.show()
Sources¶