Julia DataFrames - How to do one-hot encoding?

Ask Time：2020-10-28T09:37:16 Author：Davi Barreira

I'm using Julia's DataFrames.jl package. In it, I have a dataframe with a columns containing a list of strings (e.g. ["Type A", "Type B", "Type D"]). How does one then performs a one-hot encoding? I wasn't able to find a pre-built function in the DataFrames.jl package.

Here is an example of what I want to do:

Original Dataframe

col1 | col2 |
102  |[a]   |
103  |[a,b] | 
102  |[c,b] |

After One-hot encoding

col1 | a | b | c |
102  | 1 | 0 | 0 |
103  | 1 | 1 | 0 | 
102  | 0 | 1 | 1 |

Author:Davi Barreira，eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article：https://stackoverflow.com/questions/64565276/julia-dataframes-how-to-do-one-hot-encoding

Bogumił Kamiński :

It is easy enough to do it with basic functions we provide though:\njulia> df = DataFrame(x=rand([1:3;missing], 20))\n20×1 DataFrame\n│ Row │ x │\n│ │ Int64? │\n├─────┼─────────┤\n│ 1 │ 1 │\n│ 2 │ 2 │\n│ 3 │ missing │\n│ 4 │ 1 │\n│ 5 │ 3 │\n│ 6 │ missing │\n│ 7 │ 3 │\n│ 8 │ 3 │\n│ 9 │ 3 │\n│ 10 │ 3 │\n│ 11 │ missing │\n│ 12 │ 1 │\n│ 13 │ 3 │\n│ 14 │ 3 │\n│ 15 │ 3 │\n│ 16 │ 1 │\n│ 17 │ missing │\n│ 18 │ 1 │\n│ 19 │ 1 │\n│ 20 │ missing │\n\njulia> ux = unique(df.x); transform(df, @. :x => ByRow(isequal(ux)) .=> Symbol(:x_, ux))\n20×5 DataFrame\n│ Row │ x │ x_1 │ x_2 │ x_missing │ x_3 │\n│ │ Int64? │ Bool │ Bool │ Bool │ Bool │\n├─────┼─────────┼──────┼──────┼───────────┼──────┤\n│ 1 │ 1 │ 1 │ 0 │ 0 │ 0 │\n│ 2 │ 2 │ 0 │ 1 │ 0 │ 0 │\n│ 3 │ missing │ 0 │ 0 │ 1 │ 0 │\n│ 4 │ 1 │ 1 │ 0 │ 0 │ 0 │\n│ 5 │ 3 │ 0 │ 0 │ 0 │ 1 │\n│ 6 │ missing │ 0 │ 0 │ 1 │ 0 │\n│ 7 │ 3 │ 0 │ 0 │ 0 │ 1 │\n│ 8 │ 3 │ 0 │ 0 │ 0 │ 1 │\n│ 9 │ 3 │ 0 │ 0 │ 0 │ 1 │\n│ 10 │ 3 │ 0 │ 0 │ 0 │ 1 │\n│ 11 │ missing │ 0 │ 0 │ 1 │ 0 │\n│ 12 │ 1 │ 1 │ 0 │ 0 │ 0 │\n│ 13 │ 3 │ 0 │ 0 │ 0 │ 1 │\n│ 14 │ 3 │ 0 │ 0 │ 0 │ 1 │\n│ 15 │ 3 │ 0 │ 0 │ 0 │ 1 │\n│ 16 │ 1 │ 1 │ 0 │ 0 │ 0 │\n│ 17 │ missing │ 0 │ 0 │ 1 │ 0 │\n│ 18 │ 1 │ 1 │ 0 │ 0 │ 0 │\n│ 19 │ 1 │ 1 │ 0 │ 0 │ 0 │\n│ 20 │ missing │ 0 │ 0 │ 1 │ 0 │\n\nEDIT:\nAnother example:\njulia> df = DataFrame(col1=102:104, col2=[["a"], ["a","b"], ["c","b"]])\n3×2 DataFrame\n│ Row │ col1 │ col2 │\n│ │ Int64 │ Array… │\n├─────┼───────┼────────────┤\n│ 1 │ 102 │ ["a"] │\n│ 2 │ 103 │ ["a", "b"] │\n│ 3 │ 104 │ ["c", "b"] │\n\njulia> ux = unique(reduce(vcat, df.col2))\n3-element Array{String,1}:\n "a"\n "b"\n "c"\n\njulia> transform(df, :col2 .=> [ByRow(v -> x in v) for x in ux] .=> Symbol.(:col2_, ux))\n3×5 DataFrame\n│ Row │ col1 │ col2 │ col2_a │ col2_b │ col2_c │\n│ │ Int64 │ Array… │ Bool │ Bool │ Bool │\n├─────┼───────┼────────────┼────────┼────────┼────────┤\n│ 1 │ 102 │ ["a"] │ 1 │ 0 │ 0 │\n│ 2 │ 103 │ ["a", "b"] │ 1 │ 1 │ 0 │\n│ 3 │ 104 │ ["c", "b"] │ 0 │ 1 │ 1 │\n",

2020-10-28T07:20:38

Nils Gudat :

There is indeed no one-hot encoding function in DataFrames.jl - I would argue that this is sensible, as this is a particular machine learning transformation that should live in a an ML package rather than in a basic DataFrames package.\nYou've got two options I think:\n\nUse an ML package that does this for you, e.g. MLJ.jl. In MLJ, the OneHotEncoder is a model that transforms any table with Finite features in it into a one-hot encoded version of itself, see the docs here\n\nUse a regression package that automatically generates dummy columns for categorical variables using the StatsModels @formula API - if you fit a regression with e.g. GLM.jl and your formula is @formula(y ~ x) where x is a a categorical variable, the model matrix will automatically be constructed by contrast coding x, i.e. having binary dummy columns for all but one level of x\n\n\nFor the second option, you ideally want your data to be categorical (although strings will work as well), and for this DataFrames.jl includes the categorical! function.\nEDIT 17/11/2021: There has since been a definitive thread on this on the Julia Discourse which contains an extensive list of suggestions for doing one-hot encoding: https://discourse.julialang.org/t/all-the-ways-to-do-one-hot-encoding/\nSharing my favourite from there:\njulia> x = [1, 2, 1, 3, 2];\n\njulia> unique(x) .== permutedims(x)\n3×5 BitMatrix:\n 1 0 1 0 0\n 0 1 0 0 1\n 0 0 0 1 0\n",

2020-10-28T06:06:14

Julia DataFrames - How to do one-hot encoding?

热门文章

jpg图片怎么转换成pdf，详细教程分享！

iphone怎么把图片转成电子版？试试这2个方法！

图片如何转换pdf文件？看看这三个方法！

怎么把图片转换成pdf格式，干货教程不要错过

png图片怎么转换成pdf，实用方法不要错过

图片怎么转pdf格式？三种转换方法分享给你，一分钟轻松解决

图片转pdf格式怎么弄免费？get这五个简单的方法，轻松搞定！

如何将图片转pdf格式？4种转换方法分享给你，一分钟轻松解决

如何图片转pdf免费？快学习这三种免费转换方法

怎么将图片转pdf？分享个图片转pdf在线免费

相关搜索

jpg图片怎么转换成pdf，详细教程分享

电脑图片转pdf工具怎么用

单张pdf图片转照片格式

如何将图片转成pdf文档，经验分享

这么好用的图片转pdf软件，我一定要分享

干货分享，不懂图片转pdf的朋友快快收藏起来

分享一个让你惊叹不已的图片转pdf方法

图片转pdf工具

分享一个大家都不知道的图片转pdf格式方法

好用的图片转pdf软件要和好朋友分享

Julia DataFrames - How to do one-hot encoding?

More about “Julia DataFrames - How to do one-hot encoding?” related questions

热门文章

jpg图片怎么转换成pdf，详细教程分享！

iphone怎么把图片转成电子版？试试这2个方法！

图片如何转换pdf文件？看看这三个方法！

怎么把图片转换成pdf格式，干货教程不要错过

png图片怎么转换成pdf，实用方法不要错过

图片怎么转pdf格式？三种转换方法分享给你，一分钟轻松解决

图片转pdf格式怎么弄免费？get这五个简单的方法，轻松搞定！

如何将图片转pdf格式？4种转换方法分享给你，一分钟轻松解决

如何图片转pdf免费？快学习这三种免费转换方法

怎么将图片转pdf？分享个图片转pdf在线免费

相关搜索

jpg图片怎么转换成pdf，详细教程分享

电脑图片转pdf工具怎么用

单张pdf图片转照片格式

如何将图片转成pdf文档，经验分享

这么好用的图片转pdf软件，我一定要分享

干货分享，不懂图片转pdf的朋友快快收藏起来

分享一个让你惊叹不已的图片转pdf方法

图片转pdf工具

分享一个大家都不知道的图片转pdf格式方法

好用的图片转pdf软件要和好朋友分享