pyspark sql - in SQL, why is this JOIN returning the key column twice? -
i'm sorry if stupid question, can't seem head around it. i'm new sql , behavior strange in r or pandas or other things i'm used using.
basically, have 2 tables in 2 different databases, common key user_id
. want join columns with
select * db1.first_table t1 join db2.second_table t2 on t1.user_id = t2.user_id
great, works. except there 2 (identical) columns called user_id
. wouldn't matter, except doing in pyspark , when try export joined table flat file error 2 of columns have same name. there work-arounds this, i'm wondering if can explain why join returns both user_id
columns. seems inner join definition columns identical. why return both?
as side question, there easy way avoid behavior?
thanks in advance!
select *
returns columns tables of query. includes both user_id
columns - 1 table a, 1 table b.
the best practice list column names want returned specifically, though option shorten list be:
select tablea.*, tableb.col1, tableb.col2, ...rest of b columns except user_id
Comments
Post a Comment