pyspark.pandas.DataFrame.update��

DataFrame.update(other: pyspark.pandas.frame.DataFrame, join: str = 'left', overwrite: bool = True) → None[source]��

Modify in place using non-NA values from another DataFrame. Aligns on indices. There is no return value.

Parameters
otherDataFrame, or Series
join���left���, default ���left���

Only left join is implemented, keeping the index and columns of the original object.

overwritebool, default True

How to handle non-NA values for overlapping keys:

  • True: overwrite original DataFrame���s values with values from other.

  • False: only update values that are NA in the original DataFrame.

Returns
Nonemethod directly changes calling object

See also

DataFrame.merge

For column(s)-on-columns(s) operations.

DataFrame.join

Join columns of another DataFrame.

DataFrame.hint

Specifies some hint on the current DataFrame.

broadcast

Marks a DataFrame as small enough for use in broadcast joins.

Examples

>>> df = ps.DataFrame({'A': [1, 2, 3], 'B': [400, 500, 600]}, columns=['A', 'B'])
>>> new_df = ps.DataFrame({'B': [4, 5, 6], 'C': [7, 8, 9]}, columns=['B', 'C'])
>>> df.update(new_df)
>>> df.sort_index()
   A  B
0  1  4
1  2  5
2  3  6

The DataFrame���s length does not increase because of the update, only values at matching index/column labels are updated.

>>> df = ps.DataFrame({'A': ['a', 'b', 'c'], 'B': ['x', 'y', 'z']}, columns=['A', 'B'])
>>> new_df = ps.DataFrame({'B': ['d', 'e', 'f', 'g', 'h', 'i']}, columns=['B'])
>>> df.update(new_df)
>>> df.sort_index()
   A  B
0  a  d
1  b  e
2  c  f

For Series, its name attribute must be set.

>>> df = ps.DataFrame({'A': ['a', 'b', 'c'], 'B': ['x', 'y', 'z']}, columns=['A', 'B'])
>>> new_column = ps.Series(['d', 'e'], name='B', index=[0, 2])
>>> df.update(new_column)
>>> df.sort_index()
   A  B
0  a  d
1  b  y
2  c  e

If other contains None the corresponding values are not updated in the original dataframe.

>>> df = ps.DataFrame({'A': [1, 2, 3], 'B': [400, 500, 600]}, columns=['A', 'B'])
>>> new_df = ps.DataFrame({'B': [4, None, 6]}, columns=['B'])
>>> df.update(new_df)
>>> df.sort_index()
   A      B
0  1    4.0
1  2  500.0
2  3    6.0