Вопрос:

Filter Pandas Dataframe for Rows with Certain Value for Column

pandas dataframe filter

105 просмотра

1 ответ

24 Репутация автора

I'd like to filter a dataframe for rows with the value "United-States" in the column "nativecountry." This seems like a straightforward thing to do, but the things I've tried have failed. Here's my code for creating the dataframe:

import pandas as pd

url = 'https://archive.ics.uci.edu/ml/machine-learning-
       databases/adult/adult.data'
col_names = ['age', 'workclass', 'fnlwgt', 'education', 'educationnum', 
             'maritalstatus', 'occupation', 'relationship',
             'race', 'sex', 'capitalgain', 'capitalloss', 
             'hoursperweek', 'nativecountry', 'income']
df_adult = pd.read_csv(url, header = None, names = col_names)

I've tried the following things for filtering 'nativecountry' for 'United-States':

#This returns an empty dataframe
df_US = df_adult[df_adult["nativecountry"] == 'United-States']
#Code from this source: https://chrisalbon.com/python/pandas_index_select_and_filter.html

#This returns the error: name 'United' is not defined
df_US = df_adult.query("nativecountry == United-States")
#Code from this source: https://pythonspot.com/en/pandas-filter/

#And this doesn't work either, for some reason
df_adult.useSQLInstead(SELECT * FROM df_adult WHERE nativecountry=United-States)
...just kidding.

Any thoughts? Thanks.

Автор: Chris Woodruff Источник Размещён: 08.11.2017 10:39

Ответы (1)


1 плюс

69775 Репутация автора

Решение

Because of the value of nativecountry has a leading space, you can do the following:

df_adult[df_adult['nativecountry'].str.contains('United-States')]
Автор: Scott Boston Размещён: 08.11.2017 10:50
Вопросы из категории :
32x32