This was a solid reference:
https://www.howtogeek.com/278599/how-to-combine-text-files-using-the-cat-command-in-linux/
cat file1.txt file2.txt file3.txt > file4.txt
cat file1.txt file2.txt file3.txt | sort > file4.txt
>-Quaere Cosmos Arcana Imperii-<

by blogadmin
This was a solid reference:
https://www.howtogeek.com/278599/how-to-combine-text-files-using-the-cat-command-in-linux/
cat file1.txt file2.txt file3.txt > file4.txt
cat file1.txt file2.txt file3.txt | sort > file4.txt
by blogadmin
I was getting errors basically saying that the copy command or they move command had too many arguments passed to it meaning too many files.
Here’s the solution buried in some of the answers on this webpage:
https://stackoverflow.com/questions/11942422/moving-large-number-of-files
by blogadmin
So to do this, you can use phpMyAdmin on your server. It’s usually in Cpanel root under SQL Services, or in your Cpanel user account under same section.
You can execute SQL queries that in effect delete the Admin user, and then re-add it with a specific password.
This password is encrypted so that’s why it looks the way it does.
351683ea4e19efe34874b501fdbf9792:9b is actually the password ‘admin’.
Once you do the below changes you should be able to login using:
Username: Admin
Password: admin
EXECUTE PHPMYADMIN SQL QUERIES:
Here’s what you can do in Zen Cart 1.5.1x.
DELETE FROM admin WHERE admin_name = 'Admin'; INSERT INTO admin (admin_name, admin_email, admin_pass, admin_profile) VALUES ('Admin', 'admin@localhost', '351683ea4e19efe34874b501fdbf9792:9b', 1);
For Zen Cart v1.3.9 and older, do this QUERY:
DELETE FROM admin WHERE admin_name = 'Admin'; INSERT INTO admin (admin_name, admin_email, admin_pass, admin_level)
RESET TEMP PASSWORD AFTER REGAINING ACCESS:
Once you are back in, go into your Administrator tab and reset the password and save it someplace you can get it later like a Google Document.
Here’s more info straight from Zen Cart FAQ:
by blogadmin
grep, egrep, fgrep – print lines matching a pattern
grep [OPTIONS] PATTERN [FILE…]
grep [OPTIONS] [-e PATTERN | -f FILE] [FILE…]
grep searches the named input FILEs (or standard input if no files are named, or if a single hyphen-minus (–) is given as file name) for lines containing a match to the given PATTERN. By default, grep prints the matching lines.
In addition, two variant programs egrep and fgrep are available. egrep is the same as grep -E. fgrep is the same as grep -F. Direct invocation as either egrep or fgrep is deprecated, but is provided to allow historical applications that rely on them to run unmodified.
Generic Program Information
Matcher Selection
Matching Control
General Output Control
Output Line Prefix Control
Context Line Control
File and Directory Selection
Other Options
A regular expression is a pattern that describes a set of strings. Regular expressions are constructed analogously to arithmetic expressions, by using various operators to combine smaller expressions.
grep understands three different versions of regular expression syntax: “basic,” “extended” and “perl.” In GNU grep, there is no difference in available functionality between basic and extended syntaxes. In other implementations, basic regular expressions are less powerful. The following description applies to extended regular expressions; differences for basic regular expressions are summarized afterwards. Perl regular expressions give additional functionality, and are documented in pcresyntax(3) and pcrepattern(3), but may not be available on every system.
The fundamental building blocks are the regular expressions that match a single character. Most characters, including all letters and digits, are regular expressions that match themselves. Any meta-character with special meaning may be quoted by preceding it with a backslash.
The period . matches any single character.
Character Classes and Bracket Expressions
Anchoring
The Backslash Character and Special Expressions
Repetition
The preceding item will be matched zero or more times.
+
The preceding item will be matched one or more times.
{n}
The preceding item is matched exactly n times.
{n,}
The preceding item is matched n or more times.
{,m}
The preceding item is matched at most m times.
{n,m}
The preceding item is matched at least n times, but not more than m times.
Concatenation
Alternation
Precedence
Back References and Subexpressions
Basic vs Extended Regular Expressions
The behavior of grep is affected by the following environment variables.
The locale for category LC_foo is specified by examining the three environment variables LC_ALL, LC_foo, LANG, in that order. The first of these variables that is set specifies the locale. For example, if LC_ALL is not set, but LC_MESSAGES is set to pt_BR, then the Brazilian Portuguese locale is used for the LC_MESSAGES category. The C locale is used if none of these environment variables are set, if the locale catalog is not installed, or if grep was not compiled with national language support ( NLS ).
cx=
SGR substring for whole context lines (i.e., non-matching lines when the -v command-line option is omitted, or matching lines when -v is specified). If however the boolean rv capability and the -v command-line option are both specified, it applies to selected non-matching lines instead. The default is empty (i.e., the terminal’s default color pair).
rv
Boolean value that reverses (swaps) the meanings of the sl= and cx= capabilities when the -v command-line option is specified. The default is false (i.e., the capability is omitted).
SGR substring for matching non-empty text in any matching line (i.e., a selected line when the -v command-line option is omitted, or a context line when -v is specified). Setting this is equivalent to setting both ms= and mc= at once to the same value. The default is a bold red text foreground over the current line background.
SGR substring for matching non-empty text in a selected line. (This is only used when the -v command-line option is omitted.) The effect of the sl= (or cx= if rv) capability remains active when this kicks in. The default is a bold red text foreground over the current line background.
SGR substring for matching non-empty text in a context line. (This is only used when the -v command-line option is specified.) The effect of the cx= (or sl= if rv) capability remains active when this kicks in. The default is a bold red text foreground over the current line background.
ln=32
SGR substring for line numbers prefixing any content line. The default is a green text foreground over the terminal’s default background.
bn=32
SGR substring for byte offsets prefixing any content line. The default is a green text foreground over the terminal’s default background.
se=36
SGR substring for separators that are inserted between selected line fields (:), between context line fields, (–), and between groups of adjacent lines when nonzero context is specified (—). The default is a cyan text foreground over the terminal’s default background.
ne
Boolean value that prevents clearing to the end of line using Erase in Line (EL) to Right (\33[K) each time a colorized item ends. This is needed on terminals on which EL is not supported. It is otherwise useful on terminals for which the back_color_erase (bce) boolean terminfo capability does not apply, when the chosen highlight colors do not affect the background, or when EL is too slow or causes too much flicker. The default is false (i.e., the capability is omitted).
Normally, the exit status is 0 if selected lines are found and 1 otherwise. But the exit status is 2 if an error occurred, unless the -q or –quiet or –silent option is used and a selected line is found. Note, however, that POSIX only mandates, for programs such as grep, cmp, and diff, that the exit status in case of error be greater than 1; it is therefore advisable, for the sake of portability, to use logic that tests for this general condition instead of strict equality with 2.
Copyright 1998-2000, 2002, 2005-2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Reporting Bugs
Known Bugs
Regular Manual Pages
POSIX Programmer’s Manual Page
TeXinfo Documentation
GNU ‘s not Unix, but Unix is a beast; its plural form is Unixen.
bzgrep(1), flowdumper(1), fortune(6), gnome-search-tool(1), grepmail(1), ip(8), ksh93(1), look(1), makeindex(1), mirrordir(1), mksh(1), nawk(1), nget(1), pdsh(1), perlfunc(1), perlglossary(1), procmail(1), procmailex(5), procmailrc(5), procmailsc(5), quilt(1), regex(3), sudo(8), sudoers(5), tcpstat(1), trace-cmd-record(1), uwildmat(3), wildmat(3), xzgrep(1)
by blogadmin
gawk – pattern scanning and processing language
gawk [ POSIX or GNU style options ] -f program-file [ — ] file …
gawk [ POSIX or GNU style options ] [ — ] program-text file …
pgawk [ POSIX or GNU style options ] -f program-file [ — ] file …
pgawk [ POSIX or GNU style options ] [ — ] program-text file …
Gawk is the GNU Project’s implementation of the AWK programming language. It conforms to the definition of the language in the POSIX 1003.1 Standard. This version in turn is based on the description in The AWK Programming Language, by Aho, Kernighan, and Weinberger, with the additional features found in the System V Release 4 version of UNIX awk. Gawk also provides more recent Bell Laboratories awk extensions, and a number of GNU -specific extensions.
Pgawk is the profiling version of gawk. It is identical in every way to gawk, except that programs run more slowly, and it automatically produces an execution profile in the file awkprof.out when done. See the –profile option, below.
The command line consists of options to gawk itself, the AWK program text (if not supplied via the -f or –file options), and values to be made available in the ARGC and ARGV pre-defined AWK variables.
Gawk options may be either traditional POSIX one letter options, or GNU -style long options. POSIX options start with a single “-“, while long options start with “–“. Long options are provided for both GNU -specific features and for POSIX -mandated features.
Following the POSIX standard, gawk-specific options are supplied via arguments to the -W option. Multiple -W options may be supplied Each -W option has a corresponding long option, as detailed below. Arguments to long options are either joined with the option by an = sign, with no intervening spaces, or they may be provided in the next command line argument. Long options may be abbreviated, as long as the abbreviation remains unique.
Gawk accepts the following options, listed by frequency.
• You cannot continue lines after ? and :.
• The synonym func for the keyword function is not recognized.
• The operators ** and **= cannot be used in place of ^ and ^=.
• The fflush() function is not available.
An AWK program consists of a sequence of pattern-action statements and optional function definitions.
If the value of a particular element of ARGV is empty (“”), gawk skips over it.
For each record in the input, gawk tests to see if it matches any pattern in the AWK program. For each pattern that the record matches, the associated action is executed. The patterns are tested in the order they occur in the program.
Finally, after all the input is exhausted, gawk executes the code in the END block(s) (if any).
AWK variables are dynamic; they come into existence when they are first used. Their values are either floating-point numbers or strings, or both, depending upon how they are used. AWK also has one dimensional arrays; arrays with multiple dimensions may be simulated. Several pre-defined variables are set as a program runs; these are described as needed and summarized below.
Records
Fields
Built-in Variables
ARGV
Array of command line arguments. The array is indexed from 0 to ARGC – 1. Dynamically changing the contents of ARGV can control the files used for data.
BINMODE
On non-POSIX systems, specifies use of “binary” mode for all file I/O. Numeric values of 1, 2, or 3, specify that input files, output files, or all files, respectively, should use binary I/O. String values of “r”, or “w” specify that input files, or output files, respectively, should use binary I/O. String values of “rw” or “wr” specify that all files should use binary I/O. Any other string value is treated as “rw”, but generates a warning message.
CONVFMT
The conversion format for numbers, “%.6g”, by default.
ENVIRON
An array containing the values of the current environment. The array is indexed by the environment variables, each element being the value of that variable (e.g., ENVIRON[“HOME”] might be /home/arnold). Changing this array does not affect the environment seen by programs which gawk spawns via redirection or the system() function.
ERRNO
If a system error occurs either doing a redirection for getline, during a read for getline, or during a close(), then ERRNO will contain a string describing the error. The value is subject to translation in non-English locales.
FIELDWIDTHS
A white-space separated list of fieldwidths. When set, gawk parses the input into fields of fixed width, instead of using the value of the FS variable as the field separator.
FILENAME
The name of the current input file. If no files are specified on the command line, the value of FILENAME is “-“. However, FILENAME is undefined inside the BEGIN block (unless set by getline).
FNR
The input record number in the current input file.
FS
The input field separator, a space by default. See Fields, above.
IGNORECASE
Controls the case-sensitivity of all regular expression and string operations. If IGNORECASE has a non-zero value, then string comparisons and pattern matching in rules, field splitting with FS, record separating with RS, regular expression matching with ~ and !~, and the gensub(), gsub(), index(), match(), split(), and sub() built-in functions all ignore case when doing regular expression operations. NOTE: Array subscripting is not affected. However, the asort() and asorti() functions are affected.
NR
The total number of input records seen so far.
OFMT
The output format for numbers, “%.6g”, by default.
OFS
The output field separator, a space by default.
ORS
The output record separator, by default a newline.
PROCINFO
The elements of this array provide access to information about the running AWK program. On some systems, there may be elements in the array, “group1” through “groupn“ for some n, which is the number of supplementary groups that the process has. Use the in operator to test for these elements. The following elements are guaranteed to be available:
PROCINFO[“euid”]
the value of the geteuid(2) system call.
PROCINFO[“FS”]
“FS” if field splitting with FS is in effect, or “FIELDWIDTHS” if field splitting with FIELDWIDTHS is in effect.
PROCINFO[“gid”]
the value of the getgid(2) system call.
PROCINFO[“pgrpid”]
the process group ID of the current process.
PROCINFO[“pid”]
the process ID of the current process.
PROCINFO[“ppid”]
the parent process ID of the current process.
PROCINFO[“uid”]
the value of the getuid(2) system call.
The version of gawk. This is available from version 3.1.4 and later.
RSTART
The index of the first character matched by match(); 0 if no match. (This implies that character indices start at one.)
RLENGTH
The length of the string matched by match(); -1 if no match.
SUBSEP
The character used to separate multiple subscripts in array elements, by default “\034”.
TEXTDOMAIN
The text domain of the AWK program; used to find the localized translations for the program’s strings.
Arrays
if (val in array)
Variable Typing And Conversion
CONVFMT = "%2.2f" a = 12 b = a ""
Uninitialized variables have the numeric value 0 and the string value “” (the null, or empty, string).
Octal and Hexadecimal Constants
String Constants
\b
backspace.
\f
form-feed.
\n
newline.
\r
carriage return.
\t
horizontal tab.
\v
vertical tab.
AWK is a line-oriented language. The pattern comes first, and then the action. Action statements are enclosed in { and }. Either the pattern may be missing, or the action may be missing, but, of course, not both. If the pattern is missing, the action is executed for every single record of input. A missing action is equivalent to
Patterns
BEGIN END /regular expression/ relational expression pattern && pattern pattern || pattern pattern ? pattern : pattern (pattern) ! pattern pattern1, pattern2
The ?: operator is like the same operator in C. If the first pattern is true then the pattern used for testing is the second pattern, otherwise it is the third. Only one of the second and third patterns is evaluated.
The pattern1, pattern2 form of an expression is called a range pattern. It matches all input records starting with a record that matches pattern1, and continuing until a record that matches pattern2, inclusive. It does not combine with any other sort of pattern expression.
Regular Expressions
.
matches any character including newline.
^
matches the beginning of a string.
$
matches the end of a string.
[abc…]
character list, matches any of the characters abc….
[^abc…]
negated character list, matches any character except abc….
r1|r2
alternation: matches either r1 or r2.
r1r2
concatenation: matches r1, and then r2.
r+
matches one or more r‘s.
r*
matches zero or more r‘s.
r?
matches zero or one r‘s.
(r)
grouping: matches r.
r{n}
r{n,}
r{n,m}
One or two numbers inside braces denote an interval expression. If there is one number in the braces, the preceding regular expression r is repeated n times. If there are two numbers separated by a comma, r is repeated n to m times. If there is one number followed by a comma, then r is repeated at least n times.
\<
matches the empty string at the beginning of a word.
\>
matches the empty string at the end of a word.
\w
matches any word-constituent character (letter, digit, or underscore).
\W
matches any character that is not word-constituent.
\’
matches the empty string at the beginning of a buffer (string).
\’
matches the empty string at the end of a buffer.
[:blank:]
Space or tab characters.
[:cntrl:]
Control characters.
[:digit:]
Numeric characters.
[:graph:]
Characters that are both printable and visible. (A space is printable, but not visible, while an a is both.)
[:lower:]
Lower-case alphabetic characters.
[:print:]
Printable characters (characters that are not control characters.)
[:punct:]
Punctuation characters (characters that are not letter, digits, control characters, or space characters).
[:space:]
Space characters (such as space, tab, and formfeed, to name a few).
[:upper:]
Upper-case alphabetic characters.
[:xdigit:]
Characters that are hexadecimal digits.
Actions
Operators
++ —
Increment and decrement, both prefix and postfix.
^
Exponentiation (** may also be used, and **= for the assignment operator).
+ – !
Unary plus, unary minus, and logical negation.
* / %
Multiplication, division, and modulus.
+ –
Addition and subtraction.
space
String concatenation.
| |&
Piped I/O for getline, print, and printf.
< >
<= >=
!= ==
The regular relational operators.
~ !~
Regular expression match, negated match. NOTE: Do not use a constant regular expression (/foo/) on the left-hand side of a ~ or !~. Only use one on the right-hand side. The expression /foo/ ~ exp has the same meaning as (($0 ~ /foo/) ~ exp). This is usually not what was intended.
in
Array membership.
&&
Logical AND.
||
Logical OR.
?:
The C conditional expression. This has the form expr1 ? expr2 : expr3. If expr1 is true, the value of the expression is expr2, otherwise it is expr3. Only one of expr2 and expr3 is evaluated.
= += -=
*= /= %= ^=
Assignment. Both absolute assignment (var = value) and operator-assignment (the other forms) are supported.
Control Statements
if (condition) statement [ else statement ]
while (condition) statement
do statement while (condition)
for (expr1; expr2; expr3) statement
for (var in array) statement
break
continue
delete array[index]
delete array
exit [ expression ]
{ statements }
I/O Statements
getline <file
Set $0 from next record of file; set NF.
getline var
Set var from next input record; set NR, FNR.
getline var <file
Set var from next record of file.
Run command piping the output either into $0 or var, as above.
Run command as a co-process piping the output either into $0 or var, as above. Co-processes are a gawk extension. (command can also be a socket. See the subsection Special File Names, below.)
Prints the current record. The output record is terminated with the value of the ORS variable.
print expr-list
Prints expressions. Each expression is separated by the value of the OFS variable. The output record is terminated with the value of the ORS variable.
print expr-list >file
Prints expressions on file. Each expression is separated by the value of the OFS variable. The output record is terminated with the value of the ORS variable.
printf fmt, expr-list
Format and print.
Format and print on file.
The printf Statement
space
For numeric conversions, prefix positive values with a space, and negative values with a minus sign.
+
The plus sign, used before the width modifier (see below), says to always supply a sign for numeric conversions, even if the data to be formatted is positive. The + overrides the space modifier.
#
Use an “alternate form” for certain control letters. For %o, supply a leading zero. For %x, and %X, supply a leading 0x or 0X for a nonzero result. For %e, %E, %f and %F, the result always contains a decimal point. For %g, and %G, trailing zeros are not removed from the result.
0
A leading 0 (zero) acts as a flag, that indicates output should be padded with zeroes instead of spaces. This applies even to non-numeric output formats. This flag only has an effect when the field width is wider than the value to be printed.
width
The field should be padded to this width. The field is normally padded with spaces. If the 0 flag has been used, it is padded with zeroes.
.prec
A number that specifies the precision to use when printing. For the %e, %E, %f and %F, formats, this specifies the number of digits you want printed to the right of the decimal point. For the %g, and %G formats, it specifies the maximum number of significant digits. For the %d, %o, %i, %u, %x, and %X formats, it specifies the minimum number of digits to print. For %s, it specifies the maximum number of characters from the string that should be printed.
Special File Names
/dev/stderr
The standard error output.
/dev/fd/n
The file associated with the open file descriptor n.
/inet/raw/lport/rhost/rport
Reserved for future use.
/dev/pgrpid
Reading this file returns the process group ID of the current process, in decimal, terminated with a newline.
/dev/user
Reading this file returns a single record terminated with a newline. The fields are separated with spaces. $1 is the value of the getuid(2) system call, $2 is the value of the geteuid(2) system call, $3 is the value of the getgid(2) system call, and $4 is the value of the getegid(2) system call. If there are any additional fields, they are the group IDs returned by getgroups(2). Multiple groups may not be supported on all systems.
Numeric Functions
exp(expr)
The exponential function.
int(expr)
Truncates to integer.
log(expr)
The natural logarithm function.
rand()
Returns a random number N, between 0 and 1, such that 0 ≤ N < 1.
sin(expr)
Returns the sine of expr, which is in radians.
sqrt(expr)
The square root function.
srand([expr])
Uses expr as a new seed for the random number generator. If no expr is provided, the time of day is used. The return value is the previous seed for the random number generator.
String Functions
gensub(r, s, h [, t])
Search the target string t for matches of the regular expression r. If h is a string beginning with g or G, then replace all matches of r with s. Otherwise, h is a number indicating which match of r to replace. If t is not supplied, $0 is used instead. Within the replacement text s, the sequence \n, where n is a digit from 1 to 9, may be used to indicate just the text that matched the n‘th parenthesized subexpression. The sequence \0 represents the entire matched text, as does the character &. Unlike sub() and gsub(), the modified string is returned as the result of the function, and the original target string is not changed.
gsub(r, s [, t])
For each substring matching the regular expression r in the string t, substitute the string s, and return the number of substitutions. If t is not supplied, use $0. An & in the replacement text is replaced with the text that was actually matched. Use \& to get a literal &. (This must be typed as “\\&”; see GAWK: Effective AWK Programming for a fuller discussion of the rules for &’s and backslashes in the replacement text of sub(), gsub(), and gensub().)
index(s, t)
Returns the index of the string t in the string s, or 0 if t is not present. (This implies that character indices start at one.)
length([s])
Returns the length of the string s, or the length of $0 if s is not supplied. Starting with version 3.1.5, as a non-standard extension, with an array argument, length() returns the number of elements in the array.
match(s, r [, a])
Returns the position in s where the regular expression r occurs, or 0 if r is not present, and sets the values of RSTART and RLENGTH. Note that the argument order is the same as for the ~ operator: str ~ re. If array a is provided, a is cleared and then elements 1 through n are filled with the portions of s that match the corresponding parenthesized subexpression in r. The 0’th element of a contains the portion of s matched by the entire regular expression r. Subscripts a[n, “start”], and a[n, “length”] provide the starting index in the string and length respectively, of each matching substring.
split(s, a [, r])
Splits the string s into the array a on the regular expression r, and returns the number of fields. If r is omitted, FS is used instead. The array a is cleared first. Splitting behaves identically to field splitting, described above.
sprintf(fmt, expr-list)
Prints expr-list according to fmt, and returns the resulting string.
strtonum(str)
Examines str, and returns its numeric value. If str begins with a leading 0, strtonum() assumes that str is an octal number. If str begins with a leading 0x or 0X, strtonum() assumes that str is a hexadecimal number.
sub(r, s [, t])
Just like gsub(), but only the first matching substring is replaced.
substr(s, i [, n])
Returns the at most n-character substring of s starting at i. If n is omitted, the rest of s is used.
tolower(str)
Returns a copy of the string str, with all the upper-case characters in str translated to their corresponding lower-case counterparts. Non-alphabetic characters are left unchanged.
toupper(str)
Returns a copy of the string str, with all the lower-case characters in str translated to their corresponding upper-case counterparts. Non-alphabetic characters are left unchanged.
Time Functions
Bit Manipulations Functions
lshift(val, count)
Return the value of val, shifted left by count bits.
or(v1, v2)
Return the bitwise OR of the values provided by v1 and v2.
rshift(val, count)
Return the value of val, shifted right by count bits.
xor(v1, v2)
Return the bitwise XOR of the values provided by v1 and v2.
Internationalization Functions
Functions in AWK are defined as follows:
The word func may be used in place of function.
Beginning with version 3.1 of gawk, you can dynamically add new built-in functions to the running gawk interpreter. The full details are beyond the scope of this manual page; see GAWK: Effective AWK Programming for the details.
pgawk accepts two signals. SIGUSR1 causes it to dump a profile and function call stack to the profile file, which is either awkprof.out, or whatever file was named with the –profile option. It then continues to run. SIGHUP causes pgawk to dump the profile and function call stack and then exit.
Print and sort the login names of all users:
BEGIN{ FS = “:” }
{ print $1 | “sort” }
{ nlines++ }
END
{ print nlines }
{ print FNR, $0 }
{ print NR, $0 }
tail -f access_log |
awk ‘/myhome.html/ { system(“nmap ” $1 “>> logdir/myhome.html”) }’
String constants are sequences of characters enclosed in double quotes. In non-English speaking environments, it is possible to mark strings in the AWK program as requiring translation to the native natural language. Such strings are marked in the AWK program with a leading underscore (“_”). For example,
BEGIN { TEXTDOMAIN = "myprog" }
4.
Run gawk –gen-po -f myprog.awk > myprog.po to generate a .po file for your program.
5.
Provide appropriate translations, and build and install the corresponding .mo files.
A primary goal for gawk is compatibility with the POSIX standard, as well as with the latest version of UNIX awk. To this end, gawk incorporates the following user visible features which are not described in the AWK book, but are part of the Bell Laboratories version of awk, and are in the POSIX standard.
The book indicates that command line variable assignment happens when awk would otherwise open the argument as a file, which is after the BEGIN block is executed. However, in earlier implementations, when such an assignment appeared before any file names, the assignment would happen before the BEGIN block was run. Applications came to depend on this “feature.” When awk was changed to match its documentation, the -v option for assigning variables before program execution was added to accommodate applications that depended upon the old behavior. (This feature was agreed upon by both the Bell Laboratories and the GNU developers.)
The -W option for implementation specific features is from the POSIX standard.
When processing arguments, gawk uses the special option “–” to signal the end of arguments. In compatibility mode, it warns about but otherwise ignores undefined options. In normal operation, such arguments are passed on to the AWK program for it to process.
The AWK book does not define the return value of srand(). The POSIX standard has it return the seed it was using, to allow keeping track of random number sequences. Therefore srand() in gawk also returns its current seed.
Other new features are: The use of multiple -f options (from MKS awk); the ENVIRON array; the \a, and \v escape sequences (done originally in gawk and fed back into the Bell Laboratories version); the tolower() and toupper() built-in functions (from the Bell Laboratories version); and the ANSI C conversion specifications in printf (done first in the Bell Laboratories version).
There are two features of historical AWK implementations that gawk supports. First, it is possible to call the length() built-in function not only with no argument, but even without parentheses! Thus,
Gawk has a number of extensions to POSIX awk. They are described in this section. All the extensions described here can be disabled by invoking gawk with the –traditional or –posix options.
The following features of gawk are not available in POSIX awk.
• Octal and hexadecimal constants in AWK programs.
• The ARGIND, BINMODE, ERRNO, LINT, RT and TEXTDOMAIN variables are not special.
• The IGNORECASE variable and its side-effects are not available.
• The FIELDWIDTHS variable and fixed-width field splitting.
• The PROCINFO array is not available.
• The use of RS as a regular expression.
• The special file names available for I/O redirection are not recognized.
• The |& operator for creating co-processes.
• The ability to split out individual characters using the null string as the value of FS, and as the third argument to split().
• The optional second argument to the close() function.
• The optional third argument to the match() function.
• The ability to use positional specifiers with printf and sprintf().
• The ability to pass an array to length().
• The use of delete array to delete the entire contents of an array.
• The use of nextfile to abandon processing of the current input file.
switch (expression) {
case value|regex : statement
...
[ default: statement ]
}
The AWKPATH environment variable can be used to provide a list of directories that gawk searches when looking for files named via the -f and –file options.
If POSIXLY_CORRECT exists in the environment, then gawk behaves exactly as if –posix had been specified on the command line. If –lint has been specified, gawk issues a warning message to this effect.
egrep(1), getpid(2), getppid(2), getpgrp(2), getuid(2), geteuid(2), getgid(2), getegid(2), getgroups(2)
The AWK Programming Language, Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger, Addison-Wesley, 1988. ISBN 0-201-07981-X.
GAWK: Effective AWK Programming, Edition 3.0, published by the Free Software Foundation, 2001. The current version of this document is available online at http://www.gnu.org/software/gawk/manual.
The -F option is not necessary given the command line variable assignment feature; it remains only for backwards compatibility.
Syntactically invalid single character programs tend to overflow the parse stack, generating a rather unhelpful message. Such programs are surprisingly difficult to diagnose in the completely general case, and the effort to do so really is not worth it.
The original version of UNIX awk was designed and implemented by Alfred Aho, Peter Weinberger, and Brian Kernighan of Bell Laboratories. Brian Kernighan continues to maintain and enhance it.
Paul Rubin and Jay Fenlason, of the Free Software Foundation, wrote gawk, to be compatible with the original version of awk distributed in Seventh Edition UNIX . John Woods contributed a number of bug fixes. David Trueman, with contributions from Arnold Robbins, made gawk compatible with the new version of UNIX awk. Arnold Robbins is the current maintainer.
The initial DOS port was done by Conrad Kwok and Scott Garfinkle. Scott Deifik is the current DOS maintainer. Pat Rankin did the port to VMS, and Michal Jaegermann did the port to the Atari ST. The port to OS/2 was done by Kai Uwe Rommel, with contributions and help from Darrel Hankerson. Andreas Buening now maintains the OS/2 port. Fred Fish supplied support for the Amiga, and Martin Brown provided the BeOS port. Stephen Davies provided the original Tandem port, and Matthew Woehlke provided changes for Tandem’s POSIX-compliant systems. Ralf Wildenhues now maintains that port.
See the README file in the gawk distribution for current information about maintainers and which ports are currently supported.
This man page documents gawk, version 3.1.7.
If you find a bug in gawk, please send electronic mail to bug-gawk@gnu.org. Please include your operating system and its revision, the version of gawk (from gawk –version), what C compiler you used to compile it, and a test program and data that are as small as possible for reproducing the problem.
Before sending a bug report, please do the following things. First, verify that you have the latest version of gawk. Many bugs (usually subtle ones) are fixed at each release, and if yours is out of date, the problem may already have been solved. Second, please see if setting the environment variable LC_ALL to LC_ALL=C causes things to behave as you expect. If so, it’s a locale issue, and may or may not really be a bug. Finally, please read this man page and the reference manual carefully to be sure that what you think is a bug really is, instead of just a quirk in the language.
Whatever you do, do NOT post a bug report in comp.lang.awk. While the gawk developers occasionally read this newsgroup, posting bug reports there is an unreliable way to report bugs. Instead, please use the electronic mail addresses given above.
If you’re using a GNU/Linux system or BSD-based system, you may wish to submit a bug report to the vendor of your distribution. That’s fine, but please send a copy to the official email address as well, since there’s no guarantee that the bug will be forwarded to the gawk maintainer.
Brian Kernighan of Bell Laboratories provided valuable assistance during testing and debugging. We thank him.
Copyright © 1989, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005, 2007, 2009 Free Software Foundation, Inc.
Permission is granted to make and distribute verbatim copies of this manual page provided the copyright notice and this permission notice are preserved on all copies.
Permission is granted to copy and distribute modified versions of this manual page under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one.
Permission is granted to copy and distribute translations of this manual page into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Foundation.
by blogadmin
sed – stream editor for filtering and transforming text
sed [OPTION]… {script-only-if-no-other-script} [input-file]…
Sed is a stream editor. A stream editor is used to perform basic text transformations on an input stream (a file or input from a pipeline). While in some ways similar to an editor which permits scripted edits (such as ed), sed works by making only one pass over the input(s), and is consequently more efficient. But it is sed‘s ability to filter text in a pipeline which particularly distinguishes it from other types of editors.
This is just a brief synopsis of sed commands to serve as a reminder to those who already know sed; other documentation (such as the texinfo document) must be consulted for fuller descriptions.
Zero-address ”commands”
Zero- or One- address commands
Append text, which has each embedded newline preceded by a backslash.
i \
text
Insert text, which has each embedded newline preceded by a backslash.
Commands which accept address ranges
Delete pattern space. Start next cycle.
D
Delete up to the first embedded newline in the pattern space. Start next cycle, but skip reading from the input if there is still data in the pattern space.
h H
Copy/append pattern space to hold space.
g G
Copy/append hold space to pattern space.
x
Exchange the contents of the hold and pattern spaces.
l
List out the current line in a ”visually unambiguous” form.
P
Print up to the first embedded newline of the current pattern space.
Sed commands can be given with no addresses, in which case the command will be executed for all input lines; with one address, in which case the command will only be executed for input lines which match that address; or with two addresses, in which case the command will be executed for all input lines which match the inclusive range of lines starting from the first address and continuing to the second address. Three things to note about address ranges: the syntax is addr1,addr2 (i.e., the addresses are separated by a comma); the line which addr1 matched will always be accepted, even if addr2 selects an earlier line; and if addr2 is a regexp, it will not be tested against the line that addr1 matched.
After the address (or address-range), and before the command, a ! may be inserted, which specifies that the command shall only be executed if the address (or address-range) does not match.
The following address types are supported:
POSIX.2 BREs should be supported, but they aren’t completely because of performance problems. The \n sequence in a regular expression matches the newline character, and similarly for \a, \t, and other sequences.
E-mail bug reports to bonzini@gnu.org. Be sure to include the word ”sed” somewhere in the ”Subject:” field. Also, please include the output of ”sed –version” in the body of your report if at all possible.
Copyright © 2009 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, to the extent permitted by law.
GNU sed home page: <http://www.gnu.org/software/sed/>. General help using GNU software: <http://www.gnu.org/gethelp/>. E-mail bug reports to: <bug-gnu-utils@gnu.org>. Be sure to include the word ”sed” somewhere in the ”Subject:” field.
awk(1), ed(1), grep(1), tr(1), perlre(1), sed.info, any of various books on sed, the sed FAQ (http://sed.sf.net/grabbag/tutorials/sedfaq.txt), http://sed.sf.net/grabbag/.
The full documentation for sed is maintained as a Texinfo manual. If the info and sed programs are properly installed at your site, the command
bbe(1), cpuset(7), dialrules(5), fetchlog(1), flowdumper(1), formail(1), iostat2pcp(1), ksh(1), libarchive-formats(5), med(1), mk-configure(7), mksh(1), nawk(1), nc(1), pagermap(5), rpl(1), rubibtex(1), rumakeindex(1), virt-edit(1), zipinfo(1)
