Matching Regex Of Multiple Lines In Awk. && Operator?
Solution 1:
[Update based on clarification.]
One high order bit is that Awk is a line-oriented language, so you won't actually be able to do a normal pattern match to span lines. The usual way to do something like this is to match each line separately, and have a later clause / statement figure out if all the right pieces have been matched.
What I'm doing here is looking for an a
in the second field on one line, a b
in the second field on another line, and a c
in the second field on a third line. In the first two cases, I stash away the contents of the line as well as what line number it occurred on. When the third line is matched and we haven't yet found the whole sequence, I go back and check to see if the other two lines are present and with acceptable line numbers. If all's good, I print out the buffered previous lines and set a flag indicating that everything else should print.
Here's the script:
$2 == "a" { a = $0; aLine = NR; }
$2 == "b" { b = $0; bLine = NR; }
$2 == "c" && !keepPrinting {
if ((bLine == (NR - 1)) && (aLine == (NR - 2))) {
print a;
print b;
keepPrinting = 1;
}
}
keepPrinting { print; }
And here's a file I tested it with:
JUNK UP HERE NOT STARTING WITH NUMBER
1a0.1100.0692a0.0620.0883a0.0620.1214b0.0620.1215 c 0.0320.1006 d 0.0320.1007 e 0.0320.1008a0.0990.1219b0.0980.12110 c 0.0970.10011 x 0.0000.200
Here's what I get when I run it:
$ awk -f blort.awk blort.txt
3 a 0.0620.1214 b 0.0620.1215c0.0320.1006 d 0.0320.1007 e 0.0320.1008 a 0.0990.1219 b 0.0980.12110c0.0970.10011 x 0.0000.200
Solution 2:
No it doesn't work. You could try something like this:
/(^[0-9]+.*a[^\n]*)\n([0-9]+.*b[^\n]*)\n([0-9]+.*c[^\n]*)/
And repeat that for as many letters as you need.
The [^\n]*
will match as much non-linebreak characters in a row as possible (so up to the linebreak).
Solution 3:
A friend wrote this awk program for me. It is a state machine. And it works.
#!/usr/bin/awk -f
BEGIN {
# We start out in the "idle" state.state = "idle"
}
/^[0-9]+[[:space:]]+q/ {
# Everytime we encounter a "# q" we either print it or go to the# "q_found" state.if (state != "printing") {
state = "q_found"
line_q = $0
}
}
/^[0-9]+[[:space:]]+r/ {
# If we are in the q_found state and "# r" immediate follows,# advance to the r_found state. Else, return to "idle" and # wait for the "# q" to start us off.if (state == "q_found") {
state = "r_found"
line_r = $0
} elseif (state != "printing") {
state = "idle"
}
}
/^[0-9]+[[:space:]]+l/ {
# If we are in the r_found state and "# l" immediate follows,# advance to the l_found state. Else, return to "idle" and # wait for the "# q" to start us off.if (state == "r_found") {
state = "l_found"
line_l = $0
} elseif (state != "printing") {
state = "idle"
}
}
/^[0-9]+[[:space:]]+i/ {
# If we are in the l_found state and "# i" immediate follows,# we're ready to start printing. First, display the lines we# squirrelled away then move to the "printing" state. Else,# go to "idle" and wait for the "# q" to start us off.if (state == "l_found") {
state = "printing"print line_q
print line_r
print line_l
line = 0
} elseif (state != "printing") {
state = "idle"
}
}
/^[0-9]+[[:space:]]+/ {
# If in state "printing", print 50 lines then stop printingif (state == "printing") {
if (++line < 48) print
}
}
Post a Comment for "Matching Regex Of Multiple Lines In Awk. && Operator?"