PWC PySpark Interview Question | How to handle multiple delimiter in a csv file |

2,811 بازدید
بیشتر
GeekCoders
GeekCoders
Input data="""Id|Name|Marks1|Sagar|20,30,402|Alex|34,32,123|David|45,67,544|John|10,34,60"""dbut ...
Input
data="""
Id|Name|Marks
1|Sagar|20,30,40
2|Alex|34,32,12
3|David|45,67,54
4|John|10,34,60
"""
dbutils.fs.put('/FileStore/tables/mutliple_delimiter.csv',str(data),True)

Solution:
from pyspark.sql.functions import col,split
df=spark.read.format('csv').option('header',True).option('sep','|').load('/FileStore/tables/mutliple_delimiter.csv')
df_output=df.withColumn("Physics",split(col("Marks"),',')[0]).withColumn("Chemistry",split(col("Marks"),',')[1]).withColumn("Maths",split(col("Marks"),',')[2]).drop(col("Marks"))
display(df_output)

I have prepared many courses on Azure Data Engineering

1. Build Azure End to. End Project
 https://www.geekcoders.co.in/courses/...

2. Build Delta Lake project
https://www.geekcoders.co.in/courses/...

3. Master in Azure Data Factory with ETL Project and PowerBi
https://www.geekcoders.co.in/courses/...

4. Master in Python
https://www.geekcoders.co.in/courses/...

Check out my courses on Azure Data Engineering
https://www.geekcoders.co.in/s/store/...

hastags
tags

#dataengineer #interviewquestions #spark
#hashtags #hastag #tags

همه توضیحات ...