Selecting columns of a pandas dataframe based on criteria

pandas

1199 просмотра

1 ответ

I have a DF which contains results from the UK election results with one column per party. So the DF is something like:

In[107]: Results.columns
Out[107]: 
Index(['Press Association ID Number', 'Constituency Name', 'Region', 'Country',
       'Constituency ID', 'Constituency Type', 'Election Year', 'Electorate',
       ' Total number of valid votes counted ', 'Unnamed: 9',
       ...
       'Wessex Reg', 'Whig', 'Wigan', 'Worth', 'WP', 'WRP', 'WVPTFP', 'Yorks',
       'Young', 'Zeb'],
      dtype='object', length=147)

e.g.

Results.head(2)
Out[108]: 
   Press Association ID Number Constituency Name Region Country  \
0                            1          Aberavon  Wales   Wales   
1                            2         Aberconwy  Wales   Wales   

  Constituency ID Constituency Type  Election Year Electorate  \
0       W07000049            County           2015     49,821   
1       W07000058            County           2015     45,525   

   Total number of valid votes counted   Unnamed: 9 ...   Wessex Reg  Whig  \
0                                31,523         NaN ...          NaN   NaN   
1                                30,148         NaN ...          NaN   NaN   

   Wigan  Worth  WP  WRP  WVPTFP  Yorks  Young  Zeb  
0    NaN    NaN NaN  NaN     NaN    NaN    NaN  NaN  
1    NaN    NaN NaN  NaN     NaN    NaN    NaN  NaN  

[2 rows x 147 columns]

The columns containing the votes for the different parties are Results.ix[:, 'Unnamed: 9':]

Most of these parties poll very few votes in any constituency, and so I would like to exclude them. Is there a way (short of iterating through each row and column myself) of returning only those columns which meet a particular condition, for example having at least one value > 1000? I would ideally like to be able to specify something like

    Results.ix[:, 'Unnamed: 9': > 1000]
Автор: TimGJ Источник Размещён: 08.11.2019 11:30

Ответы (1)


1 плюс

Решение

you can do it this way:

In [94]: df
Out[94]:
          a         b         c         d           e         f         g           h
0 -1.450976 -1.361099 -0.411566  0.955718   99.882051 -1.166773 -0.468792  100.333169
1  0.049437 -0.169827  0.692466 -1.441196    0.446337 -2.134966 -0.407058   -0.251068
2 -0.084493 -2.145212 -0.634506  0.697951  101.279115 -0.442328 -0.470583   99.392245
3 -1.604788 -1.136284 -0.680803 -0.196149    2.224444 -0.117834 -0.299730   -0.098353
4 -0.751079 -0.732554  1.235118 -0.427149   99.899120  1.742388 -1.636730   99.822745
5  0.955484 -0.261814 -0.272451  1.039296    0.778508 -2.591915 -0.116368   -0.122376
6  0.395136 -1.155138 -0.065242 -0.519787  100.446026  1.584397  0.448349   99.831206
7 -0.691550  0.052180  0.827145  1.531527   -0.240848  1.832925 -0.801922   -0.298888
8 -0.673087 -0.791235 -1.475404  2.232781  101.521333 -0.424294  0.088186   99.553973
9  1.648968 -1.129342 -1.373288 -2.683352    0.598885  0.306705 -1.742007   -0.161067

In [95]: df[df.loc[:, 'e':].columns[(df.loc[:, 'e':] > 50).any()]]
Out[95]:
            e           h
0   99.882051  100.333169
1    0.446337   -0.251068
2  101.279115   99.392245
3    2.224444   -0.098353
4   99.899120   99.822745
5    0.778508   -0.122376
6  100.446026   99.831206
7   -0.240848   -0.298888
8  101.521333   99.553973
9    0.598885   -0.161067

Explanation:

In [96]: (df.loc[:, 'e':] > 50).any()
Out[96]:
e     True
f    False
g    False
h     True
dtype: bool

In [97]: df.loc[:, 'e':].columns
Out[97]: Index(['e', 'f', 'g', 'h'], dtype='object')

In [98]: df.loc[:, 'e':].columns[(df.loc[:, 'e':] > 50).any()]
Out[98]: Index(['e', 'h'], dtype='object')

Setup:

In [99]: df = pd.DataFrame(np.random.randn(10, 8), columns=list('abcdefgh'))

In [100]: df.loc[::2, list('eh')] += 100

UPDATE:

starting from Pandas 0.20.1 the .ix indexer is deprecated, in favor of the more strict .iloc and .loc indexers.

Автор: MaxU Размещён: 20.08.2016 04:11
Вопросы из категории :
32x32