regex - How to extract characters before a pattern -
i need on how extract specific string of line.
i have file thousands of lines this:
eukaryota; alveolata; ciliophora; intramacronucleata; paramecium# eukaryota; viridiplantae; streptophyta; embryophyta# bacteria; cyanobacteria; synechococcales; acaryochloridaceae; acaryochloris# eukaryota; viridiplantae# bacteria; proteobacteria; alphaproteobacteria#
and obtain first , last item of each line. output be:
eukaryota; paramecium# eukaryota; embryophyta# bacteria; acaryochloris# eukaryota; viridiplantae# bacteria; alphaproteobacteria#
i know how 1st column
awk '{print$1}' filein > fileout
but don't know how last item in different columns.
i tried adding # , keep xx characters before #
grep -e -o '.{x,x}pattern. filein > fileout
where output looks like: les; sulfolobaceae; sulfolobus# ; thermoproteaceae; caldivirga# les; haloferacaceae; haloferax# haloferacaceae; haloquadratum# ales; natrialbaceae; natrialba#
but have repeat procedure , remove ; until i'm left final item.
i've search see if there grep or awk option that, extract 1st , last column or extract characters attached # not find work me.
i appreciate suggestions on how proceed.
thanks.
$ awk 'begin{fs=ofs=";"}{print $1,$nf}' file eukaryota; paramecium# eukaryota; embryophyta# bacteria; acaryochloris# eukaryota; viridiplantae# bacteria; alphaproteobacteria#
Comments
Post a Comment