pyspark.pandas.DataFrame.rank��
-
DataFrame.
rank
(method: str = 'average', ascending: bool = True) → pyspark.pandas.frame.DataFrame[source]�� Compute numerical data ranks (1 through n) along axis. Equal values are assigned a rank that is the average of the ranks of those values.
Note
the current implementation of rank uses Spark���s Window without specifying partition specification. This leads to move all data into single partition in single machine and could cause serious performance degradation. Avoid this method against very large dataset.
- Parameters
- method{���average���, ���min���, ���max���, ���first���, ���dense���}
average: average rank of group
min: lowest rank in group
max: highest rank in group
first: ranks assigned in order they appear in the array
dense: like ���min���, but rank always increases by 1 between groups
- ascendingboolean, default True
False for ranks by high (1) to low (N)
- Returns
- rankssame type as caller
Examples
>>> df = ps.DataFrame({'A': [1, 2, 2, 3], 'B': [4, 3, 2, 1]}, columns= ['A', 'B']) >>> df A B 0 1 4 1 2 3 2 2 2 3 3 1
>>> df.rank().sort_index() A B 0 1.0 4.0 1 2.5 3.0 2 2.5 2.0 3 4.0 1.0
If method is set to ���min���, it use lowest rank in group.
>>> df.rank(method='min').sort_index() A B 0 1.0 4.0 1 2.0 3.0 2 2.0 2.0 3 4.0 1.0
If method is set to ���max���, it use highest rank in group.
>>> df.rank(method='max').sort_index() A B 0 1.0 4.0 1 3.0 3.0 2 3.0 2.0 3 4.0 1.0
If method is set to ���dense���, it leaves no gaps in group.
>>> df.rank(method='dense').sort_index() A B 0 1.0 4.0 1 2.0 3.0 2 2.0 2.0 3 3.0 1.0