Ruby algorithm to determine valid HTML structure -
i have input data array hashes, each hash description of html tag (open , end position in text , type of tag). need generate array tags put in order.
for example:
input = [ {start_p: 0, end_p: 100, start_t: '<p>', end_t: '</p>'}, {start_p: 10, end_p: 50, start_t: '<p>', end_t: '</p>'}, {start_p: 0, end_p: 100, start_t: '<span>', end_t: '</span>'}, {start_p: 20, end_p: 30, start_t: '<em>', end_t: '</em>'}, {start_p: 40, end_p: 50, start_t: '<em>', end_t: '</em>'}, {start_p: 50, end_p: 60, start_t: '<em>', end_t: '</em>'}, {start_p: 70, end_p: 80, start_t: '<em>', end_t: '</em>'}, {start_p: 8, end_p: 99, start_t: '<strong>', end_t: '</strong>'} ] expected_output: [<p><span><strong><p><em></em><em></em></p><em></em><em></em></strong></span></p>]
instead of tags in output, each tag should hash position , tag, like:
{position: 0, tag: '<p>'}
the important thing ordered in right order, respecting html rules of not having intersecting tags (if more 1 tags ending on same position, 1 opened last should go first, if 1 ends , other opens on same position, end first, etc).
this part of legacy system , input , output can't changed @ moment. also, input can big (hundreds of thousands of elements)
any better solution brute force recursive ?
input.group_by { |h| h[:start_p] }. values. flat_map |a| x = 1.0 a.flat_map |h| x /= 2.0 [[h[:start_p] += x, h[:start_t]], [h[:end_p] -= x, h[:end_t]]] end end.sort_by(&:first).map(&:last).join #=> "<span><p><strong><p><em></em><em></p></em><em></em><em></em></strong></p></span>"
the steps follows.
b = input.group_by { |h| h[:start_p] } #=> { 0=>[{:start_p=>0, :end_p=>100, :start_t=>"<p>", :end_t=>"</p>"}, # {:start_p=>0, :end_p=>100, :start_t=>"<span>", :end_t=>"</span>"}], # 10=>[{:start_p=>10, :end_p=>50, :start_t=>"<p>", :end_t=>"</p>"}], # 20=>[{:start_p=>20, :end_p=>30, :start_t=>"<em>", :end_t=>"</em>"}], # 40=>[{:start_p=>40, :end_p=>50, :start_t=>"<em>", :end_t=>"</em>"}], # 50=>[{:start_p=>50, :end_p=>60, :start_t=>"<em>", :end_t=>"</em>"}], # 70=>[{:start_p=>70, :end_p=>80, :start_t=>"<em>", :end_t=>"</em>"}], # 8=>[{:start_p=> 8, :end_p=>99, :start_t=>"<strong>", :end_t=>"</strong>"}]} c = b.values #=> [[{:start_p=>0, :end_p=>100, :start_t=>"<p>", :end_t=>"</p>"}, # {:start_p=>0, :end_p=>100, :start_t=>"<span>", :end_t=>"</span>"}], # [{:start_p=>10, :end_p=>50, :start_t=>"<p>", :end_t=>"</p>"}], # ... # [{:start_p=>8, :end_p=>99, :start_t=>"<strong>", :end_t=>"</strong>"}]] d = c.flat_map |a| x = 1.0 a.flat_map |h| x /= 2.0 [[h[:start_p] += x, h[:start_t]], [h[:end_p] -= x, h[:end_t]]] end end #=> [[0.5, "<p>"], [99.5, "</p>"], [0.25, "<span>"], [99.75, "</span>"], # [10.5, "<p>"], [49.5, "</p>"], [20.5, "<em>"], [29.5, "</em>"], # [40.5, "<em>"], [49.5, "</em>"], [50.5, "<em>"], [59.5, "</em>"], # [70.5, "<em>"], [79.5, "</em>"], [8.5, "<strong>"], [98.5, "</strong>"]]
the first 4 element of d
(tuples) important understanding approach i've taken.
e = d.sort_by(&:first) #=> [[0.25, "<span>"], [0.5, "<p>"], [8.5, "<strong>"], [10.5, "<p>"], # [20.5, "<em>"], [29.5, "</em>"], [40.5, "<em>"], [49.5, "</p>"], # [49.5, "</em>"], [50.5, "<em>"], [59.5, "</em>"], [70.5, "<em>"], # [79.5, "</em>"], [98.5, "</strong>"], [99.5, "</p>"], [99.75, "</span>"]] f = e.map(&:last) #=> ["<span>", "<p>", "<strong>", "<p>", "<em>", "</em>", "<em>", "</p>", # "</em>", "<em>", "</em>", "<em>", "</em>", "</strong>", "</p>", "</span>"] f.join #=> "<span><p><strong><p><em></em><em></p></em><em></em><em></em></strong></p></span>"
i elaborate calculation of d
above if requested so.
Comments
Post a Comment