sed introduction

sed

sed was originally written 1973 or 1974 by Lee E. McMahon as a stream editor. And this is exactly what sed does: you can modify streams of text on the fly. The work cycle of sed is, for each line:

read an entire line from stdin into its pattern buffer
modify the pattern buffer according to the supplied commands
print the pattern buffer to stdout

sed uses Basic Regular Expressions as (opposed to Extended Regular Expressions or Perl Compatible Regular Expressions used in most other programs). Basic Regular Expressions are very similar to other types of Regular Expressions, in fact many users won't see any difference at all.

sed Synopsis

sed [options] program [inputfile]

The following simple program consists of one command only: 'd'. The command 'd' tells sed to delete the pattern buffer.

bash$ sed -e 'd' /etc/hosts

When you launch this script (apparently) nothing happens. Remember the general work-flow of sed: read a line into the pattern buffer, elaborate the line according to the script and then print the line to stdout. And this is exactly what happened. In this case the pattern buffer is deleted by the script, and no output was generated.

Another command is 'p'. It tells sed to print the pattern buffer.

bash$ sed -e 'p' /etc/hosts

The effect of this script is to print the line twice. Remember the operation mode of sed:

read a line from stdin
print the line (because of the 'p' command)
print the pattern buffer to stdout

Addresses

Not always we want to apply a command to every single line. Sometimes we want to apply a command to a single line, or a block of lines. sed provides a mechanism to work only on specific lines. The mechanism to select specific lines in sed is called an address.

An address is one of the following:

n	selects line number n.
$	selects the last line
/re/	selects the lines matching the Regular Expression re
\crec	selects the lines matching the Regular Expression re. The character c can be freely chosen
first~step	(GNU extension!) Selects every step-th line starting with line first
addr1,addr2	Address range: selects all input lines which match the inclusive range of lines starting from the first address and continuing to the second address
addr!	select lines that do not match addr

Examples

The command '=' prints the current line number. A substitute program for wc -l (count the number of lines) might be:

bash$ sed -n -e '$='

Both examples that follow emulate the UNIX program head:

bash$ sed -n -e '1,10p'
bash$ sed -e '10q'

The first example uses the address pair '1,10' to select the lines to print. The second example uses the implicit print command at each cycle to provide the output. When the address '10' matches, sed will be terminated.

Substitution Command

Eliminate comments

bash$ sed -e 's/#.*//' /etc/inetd

Eliminate comments and empty lines

bash$ sed -e 's/#.*//;/^$/d' /etc/inetd

Have a 133t prompt

bash$ ls -l | sed -e 's/o/0/;s/l/1/;s/e/3/'

bash$ ls -l | sed -e 's/o/0/g;s/l/1/g;s/e/3/g'

bash$ ls -l | sed -e 'y/ole/013/g'

Convert a file from DOS to UNIX and vice versa

# Under UNIX: convert DOS newlines (CR/LF) to Unix format
bash$ sed 's/.$//' file    # assumes that all lines end with CR/LF
bash$ sed 's/^M$// file    # in bash/tcsh, press Ctrl-V then Ctrl-M
# Under DOS: convert Unix newlines (LF) to DOS format
C:\> sed 's/$//' file    # method 1
C:\> sed -n p file       # method 2

Alternatively use the utilities dos2unix and unix2dos, or the command

tr -d [^M] < inputfile > outputfile

for a conversion from DOS to UNIX, or

:set fileformat=dos:set fileformat=unix

from within vim, or...

Comments

The character # is a command (which cannot have any address). This is useful if the sed-program is stored in a file. The whole program can be executed with

bash$ sed -f programfile < inputdata

The { and } commands group different commands. } is a command and must therefore be separated by a semicolon.

bash$ sed -ne '/gimme this line number/{=;q;}'

The command n reads a new line from stdin

/skip this line/{d;n;}
 # do some nasty stuff
 ...

REs are greedy

Example: eliminating HTML-tags from a file

bash$ sed -e 's/<.*>//g' text.html

If the file contains a line like:

This <b> is </b> a <i>example</i>.

then the result will be:

This.

Solution:

bash$ sed -e 's/<[^>]*>//g' text.html

References

The elleff-Language:

Every vocal c in a word is substituted with clcfc. The ampersand (&) holds the matched string:

bash$ sed -e 's/[aeiou]\+/&l&f&/g'

Referencing a sub-string

Sub-strings enclosed with $ and $ can be referenced with \n (n is a digit from 1 to 9)

bash$ sed -e 's/\([^ ]\+\)  *\([^ ]\+\)  *\([^ ]\+\)/\3 \2 \1/'

swaps the first three words in a line
does nothing if the line contains less than 3 words.

The elleff inverse transform

The RE following matches strings which are not elleff-vocals.

[aeiou]l[aeiou]f[aeiou]

Basic REs can use the back-reference in the RE itself!

bash$ sed -e 's/\([aeiou]\+\)l\1f\1/\1/g'

Space Balls

The patterns are manipulated in the pattern space
The hold space can store multiple lines, separated by newline.
There are commands to fill/empty the hold space
There aren't any commands to work directly on the hold space

D	Delete text in the pattern space up to the first newline
N	Add a newline to the pattern space, then append the next line of input to the pattern space
P	Print out the portion of the pattern space up to the first newline
h	Replace the contents of the hold space with the contents of the pattern space
H	Append a newline to the contents of the hold space, and then append the contents of the pattern space to that of the hold space
g	Replace the contents of the pattern space with the contents of the hold space
G	Append a newline to the contents of the pattern space, and then append the contents of the hold space to that of the pattern space
x	Exchange the contents of the hold and pattern spaces

Space Balls: Example

Print the first line as last

bash$ sed -n -e '1h;1!p;${g;p;}'

h: hold space <- pattern space

g: pattern space <- hold space

Emulation of tac

bash$ sed -n -e 'G;h;$p'

G: pattern space <<- '\n' hold space

Problem: The output shows a exceeding newline at the end: it is because G adds a newline followed by the content of the hold buffer to the pattern buffer, even in the first line (which is printed at the end).

tac improved

bash$ sed -n -e 'G;h;$s/.$//p'
bash$ sed -n -e '1!G;h;$p'

Example: a counter in sed

/^[[:digit:]][[:digit:]]*$/!n;         # the line must contain only digits
x;s/.*//;x;                            # clear the hold space
: add
/9$/{s/9$//;x;s/.*/0&/;x;b add;};  # eliminate the last 9 from the p.s.
                                       # and add a 0 in front of the h.s.
s/8$/9/
s/7$/8/
s/6$/7/
s/5$/6/
s/4$/5/
s/3$/4/
s/2$/3/
s/1$/2/
s/0$/1/
s/^$/1/
G;s/\n//g;            # add the content of the h.s to the p.s

Branches

: label	Definition of label (up to 8 characters)
b label	unconditionally branch to label
t label	branch to label only if there has been a successful 's'ubstitution since the last input line was read or 't' branch was taken

If label is omitted in the b or t command, then the next cycle is started.

Eliminate K/K++ comments

one-line comments (K++): kk...
multi-line comments (K): ko...ok

#!/bin/sed -f

# delete K++ comments
/^[[:blank:]]*kk.*/d
s/kk.*//

# If no comment is found, then start a new cicle
: test
/ko/!b

# Append new lines to the pattern space until a entire K-comment is in the
# pattern space
: append
/ok/!{N;b append;}

# delete every K-comment (but don't be greedy!)
s/ko\([^o]\|o[^k]\)*o\?ok//g

t test