sed introduction
sed
sed was originally written 1973 or 1974 by Lee E. McMahon as a stream editor. And this is exactly what sed does: you can modify streams of text on the fly. The work cycle of sed is, for each line:
- read an entire line from stdin into its pattern buffer
- modify the pattern buffer according to the supplied commands
- print the pattern buffer to stdout
sed uses Basic Regular Expressions as (opposed to Extended Regular Expressions or Perl Compatible Regular Expressions used in most other programs). Basic Regular Expressions are very similar to other types of Regular Expressions, in fact many users won't see any difference at all.
sed Synopsis
sed [options] program [inputfile]
The following simple program consists of one command only: 'd'. The command 'd' tells sed to delete the pattern buffer.
bash$ sed -e 'd' /etc/hosts
When you launch this script (apparently) nothing happens. Remember the general work-flow of sed: read a line into the pattern buffer, elaborate the line according to the script and then print the line to stdout. And this is exactly what happened. In this case the pattern buffer is deleted by the script, and no output was generated.
Another command is 'p'. It tells sed to print the pattern buffer.
bash$ sed -e 'p' /etc/hosts
The effect of this script is to print the line twice. Remember the operation mode of sed:
- read a line from stdin
- print the line (because of the 'p' command)
- print the pattern buffer to stdout
Addresses
Not always we want to apply a command to every single line. Sometimes we want to apply a command to a single line, or a block of lines. sed provides a mechanism to work only on specific lines. The mechanism to select specific lines in sed is called an address.
An address is one of the following:
n | selects line number n. |
$ | selects the last line |
/re/ | selects the lines matching the Regular Expression re |
\crec | selects the lines matching the Regular Expression re. The character c can be freely chosen |
first~step | (GNU extension!) Selects every step-th line starting with line first |
addr1,addr2 | Address range: selects all input lines which match the inclusive range of lines starting from the first address and continuing to the second address |
addr! | select lines that do not match addr |
Examples
The command '=' prints the current line number. A substitute program for wc -l
(count the number of lines) might be:
bash$ sed -n -e '$='
Both examples that follow emulate the UNIX program head
:
bash$ sed -n -e '1,10p'
bash$ sed -e '10q'
The first example uses the address pair '1,10' to select the lines to print. The second example uses the implicit print command at each cycle to provide the output. When the address '10' matches, sed will be terminated.
Substitution Command
Eliminate comments
bash$ sed -e 's/#.*//' /etc/inetd
Eliminate comments and empty lines
bash$ sed -e 's/#.*//;/^$/d' /etc/inetd
Have a 133t prompt
bash$ ls -l | sed -e 's/o/0/;s/l/1/;s/e/3/'
bash$ ls -l | sed -e 's/o/0/g;s/l/1/g;s/e/3/g'
bash$ ls -l | sed -e 'y/ole/013/g'
Convert a file from DOS to UNIX and vice versa
# Under UNIX: convert DOS newlines (CR/LF) to Unix format
bash$ sed 's/.$//' file # assumes that all lines end with CR/LF
bash$ sed 's/^M$// file # in bash/tcsh, press Ctrl-V then Ctrl-M
# Under DOS: convert Unix newlines (LF) to DOS format
C:\> sed 's/$//' file # method 1
C:\> sed -n p file # method 2
Alternatively use the utilities dos2unix and unix2dos, or the command
tr -d [^M] < inputfile > outputfile
for a conversion from DOS to UNIX, or
:set fileformat=dos:set fileformat=unix
from within vim, or...
The character #
is a command (which cannot have any address). This is useful if the sed-program is stored in a file. The whole program can be executed with
bash$ sed -f programfile < inputdata
The {
and }
commands group different commands. }
is a command and must therefore be separated by a semicolon.
bash$ sed -ne '/gimme this line number/{=;q;}'
The command n
reads a new line from stdin
/skip this line/{d;n;}
# do some nasty stuff
...
REs are greedy
Example: eliminating HTML-tags from a file
bash$ sed -e 's/<.*>//g' text.html
If the file contains a line like:
This <b> is </b> a <i>example</i>.
then the result will be:
This.
Solution:
bash$ sed -e 's/<[^>]*>//g' text.html
References
The elleff
-Language:
Every vocal c in a word is substituted with clcfc. The ampersand (&) holds the matched string:
bash$ sed -e 's/[aeiou]\+/&l&f&/g'
Referencing a sub-string
Sub-strings enclosed with \(
and \)
can be referenced with \n
(n is a digit from 1 to 9)
bash$ sed -e 's/\([^ ]\+\) *\([^ ]\+\) *\([^ ]\+\)/\3 \2 \1/'
- swaps the first three words in a line
- does nothing if the line contains less than 3 words.
The RE following matches strings which are not elleff
-vocals.
[aeiou]l[aeiou]f[aeiou]
Basic REs can use the back-reference in the RE itself!
bash$ sed -e 's/\([aeiou]\+\)l\1f\1/\1/g'
Space Balls
- The patterns are manipulated in the pattern space
- The hold space can store multiple lines, separated by newline.
- There are commands to fill/empty the hold space
- There aren't any commands to work directly on the hold space
D | Delete text in the pattern space up to the first newline |
N | Add a newline to the pattern space, then append the next line of input to the pattern space |
P | Print out the portion of the pattern space up to the first newline |
h | Replace the contents of the hold space with the contents of the pattern space |
H | Append a newline to the contents of the hold space, and then append the contents of the pattern space to that of the hold space |
g | Replace the contents of the pattern space with the contents of the hold space |
G | Append a newline to the contents of the pattern space, and then append the contents of the hold space to that of the pattern space |
x | Exchange the contents of the hold and pattern spaces |
Space Balls: Example
Print the first line as last
bash$ sed -n -e '1h;1!p;${g;p;}'
h: hold space <- pattern space
g: pattern space <- hold space
Emulation of tac
bash$ sed -n -e 'G;h;$p'
G: pattern space <<- '\n' hold space
Problem: The output shows a exceeding newline at the end: it is because G
adds a newline followed by the content of the hold buffer to the pattern buffer, even in the first line (which is printed at the end).
tac improved
bash$ sed -n -e 'G;h;$s/.$//p'
bash$ sed -n -e '1!G;h;$p'
Example: a counter in sed
/^[[:digit:]][[:digit:]]*$/!n; # the line must contain only digits
x;s/.*//;x; # clear the hold space
: add
/9$/{s/9$//;x;s/.*/0&/;x;b add;}; # eliminate the last 9 from the p.s.
# and add a 0 in front of the h.s.
s/8$/9/
s/7$/8/
s/6$/7/
s/5$/6/
s/4$/5/
s/3$/4/
s/2$/3/
s/1$/2/
s/0$/1/
s/^$/1/
G;s/\n//g; # add the content of the h.s to the p.s
Branches
: label | Definition of label (up to 8 characters) |
b label | unconditionally branch to label |
t label | branch to label only if there has been a successful 's'ubstitution since the last input line was read or 't' branch was taken |
If label is omitted in the b or t command, then the next cycle is started.
- one-line comments (K++): kk...
- multi-line comments (K): ko...ok
#!/bin/sed -f
# delete K++ comments
/^[[:blank:]]*kk.*/d
s/kk.*//
# If no comment is found, then start a new cicle
: test
/ko/!b
# Append new lines to the pattern space until a entire K-comment is in the
# pattern space
: append
/ok/!{N;b append;}
# delete every K-comment (but don't be greedy!)
s/ko\([^o]\|o[^k]\)*o\?ok//g
t test