pyspark sql - in SQL, why is this JOIN returning the key column twice? -

January 15, 2010

i'm sorry if stupid question, can't seem head around it. i'm new sql , behavior strange in r or pandas or other things i'm used using.

basically, have 2 tables in 2 different databases, common key user_id. want join columns with

select * db1.first_table t1  join db2.second_table t2  on t1.user_id = t2.user_id

great, works. except there 2 (identical) columns called user_id. wouldn't matter, except doing in pyspark , when try export joined table flat file error 2 of columns have same name. there work-arounds this, i'm wondering if can explain why join returns both user_id columns. seems inner join definition columns identical. why return both?

as side question, there easy way avoid behavior?

thanks in advance!

select * returns columns tables of query. includes both user_id columns - 1 table a, 1 table b.

the best practice list column names want returned specifically, though option shorten list be:

select tablea.*,         tableb.col1,         tableb.col2,         ...rest of b columns except user_id

Search This Blog

RT

pyspark sql - in SQL, why is this JOIN returning the key column twice? -

Comments

Post a Comment

Popular posts from this blog

javascript - Replicate keyboard event with html button -

node.js - Node js - Trying to send POST request, but it is not loading javascript content -

Ansible warning on jinja2 braces on when -