pyspark.pandas.DataFrame.spark.to_spark_io��
-
spark.
to_spark_io
(path: Optional[str] = None, format: Optional[str] = None, mode: str = 'overwrite', partition_cols: Union[str, List[str], None] = None, index_col: Union[str, List[str], None] = None, **options: OptionalPrimitiveType) → None�� Write the DataFrame out to a Spark data source.
DataFrame.spark.to_spark_io()
is an alias ofDataFrame.to_spark_io()
.- Parameters
- pathstring, optional
Path to the data source.
- formatstring, optional
Specifies the output data source format. Some common ones are:
���delta���
���parquet���
���orc���
���json���
���csv���
- modestr {���append���, ���overwrite���, ���ignore���, ���error���, ���errorifexists���}, default
���overwrite���. Specifies the behavior of the save operation when data already.
���append���: Append the new data to existing data.
���overwrite���: Overwrite existing data.
���ignore���: Silently ignore this operation if data already exists.
���error��� or ���errorifexists���: Throw an exception if data already exists.
- partition_colsstr or list of str, optional
Names of partitioning columns
- index_col: str or list of str, optional, default: None
Column names to be used in Spark to represent pandas-on-Spark���s index. The index name in pandas-on-Spark is ignored. By default, the index is always lost.
- optionsdict
All other options passed directly into Spark���s data source.
- Returns
- None
See also
read_spark_io
DataFrame.to_delta
DataFrame.to_parquet
DataFrame.to_table
DataFrame.to_spark_io
DataFrame.spark.to_spark_io
Examples
>>> df = ps.DataFrame(dict( ... date=list(pd.date_range('2012-1-1 12:00:00', periods=3, freq='M')), ... country=['KR', 'US', 'JP'], ... code=[1, 2 ,3]), columns=['date', 'country', 'code']) >>> df date country code 0 2012-01-31 12:00:00 KR 1 1 2012-02-29 12:00:00 US 2 2 2012-03-31 12:00:00 JP 3
>>> df.to_spark_io(path='%s/to_spark_io/foo.json' % path, format='json')