NB: Recent versions of Linux (version 2.6.17 and later) have full autotuning with 4 MB maximum buffer sizes. Except in some rare cases, manual tuning is unlikely to substantially improve the performance of these kernels over most network paths, and is not generally recommended
Since autotuning and large default buffer sizes were released progressively over a succession of different kernel versions, it is best to inspect and only adjust the tuning as needed. When you upgrade kernels, you may want to consider removing any local tuning.
All system parameters can be read or set by accessing special files in the /proc file system. E.g.:
If the parameter tcp_moderate_rcvbuf is present and has value 1 then autotuning is in effect. With autotuning, the receiver buffer size (and TCP window size) is dynamically updated (autotuned) for each connection. (Sender side autotuning has been present and unconditionally enabled for many years now).
The per connection memory space defaults are set with two 3 element arrays:
These are arrays of three values: minimum, initial and maximum buffer size. They are used to set the bounds on autotuning and balance memory usage while under memory stress. Note that these are controls on the actual memory usage (not just TCP window size) and include memory used by the socket data structures as well as memory wasted by short packets in large buffers. The maximum values have to be larger than the BDP of the path by some suitable overhead.
With autotuning, the middle value just determines the initial buffer size. It is best to set it to some optimal value for typical small flows. With autotuning, excessively large initial buffer waste memory and can even hurt performance.
If autotuning is not present (Linux 2.4 before 2.4.27 or Linux 2.6 before 2.6.7), you may want to get a newer kernel. Alternately, you can adjust the default socket buffer size for all TCP connections by setting the middle tcp_rmem value to the calculated BDP. This is NOT recommended for kernels with autotuning. Since the sending side is autotuned, this is never recommended for tcp_wmem.
The maximum buffer size that applications can request (the maximum acceptable values for SO_SNDBUF and SO_RCVBUF arguments to the setsockopt() system call) can be limited with /proc variables:
The kernel sets the actual memory limit to twice the requested value (effectively doubling rmem_max and wmem_max) to provide for sufficient memory overhead. You do not need to adjust these unless your are planing to use some form of application tuning.
NB: Manually adjusting socket buffer sizes with setsockopt() disables autotuning. Application that are optimized for other operating systems may implicitly defeat Linux autotuning.
The following values (which are the defaults for 2.6.17 with more than 1 GByte of memory) would be reasonable for all paths with a 4MB BDP or smaller (you must be root):
Do not adjust tcp_mem unless you know exactly what you are doing. This array (in units of pages) determines how the system balances the total network buffer space against all other LOWMEM memory usage. The three elements are initialized at boot time to appropriate fractions of the available system memory.
You do not need to adjust rmem_default or wmem_default (at least not for TCP tuning). These are the default buffer sizes for non-TCP sockets (e.g. unix domain and UDP sockets).
All standard advanced TCP features are on by default. You can check them by:
Linux supports both /proc and sysctl (using alternate forms of the variable names – e.g. net.core.rmem_max) for inspecting and adjusting network tuning parameters. The following is a useful shortcut for inspecting all tcp parameters:
For additional information on kernel variables, look at the documentation included with your kernel source, typically in some location such as /usr/src/linux-/Documentation/networking/ip-sysctl.txt. There is a very good (but slightly out of date) tutorial on network sysctl’s at http://ipsysctl-tutorial.frozentux.net/ipsysctl-tutorial.html.
If you would like to have these changes to be preserved across reboots, you can add the tuning commands to your the file /etc/rc.d/rc.local .
That’s it !
POWERDNS with MYSQL BACKEND (MASTER/SLAVE OPERATION)
Create a database named pdns and populate it with the tables described in the DOC.
References:- http://doc.powerdns.com/generic-mypgsql-backends.html#AEN6152l http://doc.powerdns.com/slave.html#id445275 http://doc.powerdns.com/replication.html#native-replication http://doc.powerdns.com/master.html http://doc.powerdns.com/pdns-on-unix.html http://doc.powerdns.com/monitoring.html
CREATE UNIQUE INDEX name_index ON domains(name);
CREATE TABLE records ( id INT auto_increment, domain_id INT DEFAULT NULL, name VARCHAR(255) DEFAULT NULL, type VARCHAR(6) DEFAULT NULL, content VARCHAR(255) DEFAULT NULL, ttl INT DEFAULT NULL, prio INT DEFAULT NULL, change_date INT DEFAULT NULL, primary key(id) )type=InnoDB;
CREATE INDEX rec_name_index ON records(name); CREATE INDEX nametype_index ON records(name,type); CREATE INDEX domain_id ON records(domain_id);
CREATE TABLE supermasters ( ip VARCHAR(25) NOT NULL, nameserver VARCHAR(255) NOT NULL, account VARCHAR(40) DEFAULT NULL );
GRANT SELECT ON supermasters TO pdns; GRANT ALL ON domains TO pdns; GRANT ALL ON records TO pdns;
Then edit /etc/pdns/pdns.conf on both servers to configure the backend and enable master/slave operation.
For ns1: allow-axfr-ips= 10.50.11.20 disable-axfr=no master=yes launch=gmysql gmysql-host=127.0.0.1 gmysql-user=pdns gmysql-dbname=pdns gmysql-password= For ns2: slave=yes gmysql-host=127.0.0.1 gmysql-user=pdns gmysql-dbname=pdns gmysql-password= Make sure the new configuration is loaded, on both servers run: # /etc/init.d/pdns restart Now, add the supermaster to the database on ns2: INSERT INTO supermasters (ip, nameserver, account) VALUES (’10.50.11.10′, ‘ns2.example.com’, ‘admin’); Consider:- 10.50.11.10:- Master servers IP 10.50.11.20:- slave servers IP On ns1, create a domain: INSERT INTO domains (name, type) VALUES (’example.com’, ‘MASTER’); INSERT INTO records (domain_id, name, content, type, ttl, prio) VALUES (1, ‘example.com’, ‘ns1.example.com hostmaster.example.com 1′, ‘SOA’, 86400, NULL); INSERT INTO records (domain_id, name, content, type, ttl, prio) VALUES (1, ‘example.com’, ‘ns1.example.com’, ‘NS’, 86400, NULL); INSERT INTO records (domain_id, name, content, type, ttl, prio) VALUES (1, ‘example.com’, ‘ns2.example.com’, ‘NS’, 86400, NULL); INSERT INTO records (domain_id, name, content, type, ttl, prio) VALUES (1, ‘ns1.example.com’, ‘10.0.0.10′, ‘A’, 86400, NULL); INSERT INTO records (domain_id, name, content, type, ttl, prio) VALUES (1, ‘ns2.example.com’, ‘10.0.0.20′, ‘A’, 86400, NULL); Now try if it works on ns1: # dig @10.0.0.10 ns example.com … ;; QUESTION SECTION: ;example.com. IN NS ;; ANSWER SECTION: example.com. 86400 IN NS ns2.example.com. example.com. 86400 IN NS ns1.example.com. ;; ADDITIONAL SECTION: ns2.example.com. 86400 IN A 10.0.0.10 ns1.example.com. 86400 IN A 10.0.0.20 … The tables on ns2 haven’t been populated yet, but if we update the serial on ns1 and trigger a NOTIFY, it will sync up. UPDATE records SET content = ‘ns1.example.com hostmaster.example.com 2′ WHERE id = ‘1′; The zone is now synchronized, try it out. DOCUMENTATION:- Basic functionality 4 queries are needed for regular lookups, 4 for ‘fancy records’ which are disabled by default and 1 is needed for zone transfers. The 4+4 regular queries must return the following 6 fields, in this exact order: content This is the ‘right hand side’ of a DNS record. For an A record, this is the IP address for example. ttl TTL of this record, in seconds. Must be a real value, no checking is performed. prio For MX records, this should be the priority of the mail exchanger specified. qtype The ASCII representation of the qtype of this record. Examples are ‘A’, ‘MX’, ‘SOA’, ‘AAAA’. Make sure that this field returns an exact answer – PDNS won’t recognise ‘A ‘ as ‘A’. This can be achieved by using a VARCHAR instead of a CHAR. domain_id Each domain must have a unique domain_id. No two domains may share a domain_id, all records in a domain should have the same. A number. name Actual name of a record. Must not end in a ‘.’ and be fully qualified – it is not relative to the name of the domain! Native operation For native operation, either drop the FOREIGN KEY on the domain_id field, or (recommended), make sure the domains table is filled properly. To add a domain, issue the following: insert into domains (name,type) values (’powerdns.com’,'NATIVE’); The records table can now be filled by with the domain_id set to the id of the domains table row just inserted. ENJOY !!!
For ns2: slave=yes gmysql-host=127.0.0.1 gmysql-user=pdns gmysql-dbname=pdns gmysql-password= Make sure the new configuration is loaded, on both servers run: # /etc/init.d/pdns restart
Now, add the supermaster to the database on ns2:
Consider:- 10.50.11.10:- Master servers IP 10.50.11.20:- slave servers IP
On ns1, create a domain:
Now try if it works on ns1: # dig @10.0.0.10 ns example.com … ;; QUESTION SECTION: ;example.com. IN NS
;; ANSWER SECTION: example.com. 86400 IN NS ns2.example.com. example.com. 86400 IN NS ns1.example.com.
;; ADDITIONAL SECTION: ns2.example.com. 86400 IN A 10.0.0.10 ns1.example.com. 86400 IN A 10.0.0.20 … The tables on ns2 haven’t been populated yet, but if we update the serial on ns1 and trigger a NOTIFY, it will sync up. UPDATE records SET content = ‘ns1.example.com hostmaster.example.com 2′ WHERE id = ‘1′; The zone is now synchronized, try it out.
DOCUMENTATION:- Basic functionality 4 queries are needed for regular lookups, 4 for ‘fancy records’ which are disabled by default and 1 is needed for zone transfers. The 4+4 regular queries must return the following 6 fields, in this exact order: content This is the ‘right hand side’ of a DNS record. For an A record, this is the IP address for example. ttl TTL of this record, in seconds. Must be a real value, no checking is performed. prio For MX records, this should be the priority of the mail exchanger specified. qtype The ASCII representation of the qtype of this record. Examples are ‘A’, ‘MX’, ‘SOA’, ‘AAAA’. Make sure that this field returns an exact answer – PDNS won’t recognise ‘A ‘ as ‘A’. This can be achieved by using a VARCHAR instead of a CHAR. domain_id Each domain must have a unique domain_id. No two domains may share a domain_id, all records in a domain should have the same. A number. name Actual name of a record. Must not end in a ‘.’ and be fully qualified – it is not relative to the name of the domain!
Native operation For native operation, either drop the FOREIGN KEY on the domain_id field, or (recommended), make sure the domains table is filled properly. To add a domain, issue the following: insert into domains (name,type) values (’powerdns.com’,'NATIVE’);
The records table can now be filled by with the domain_id set to the id of the domains table row just inserted.
ENJOY !!!
Things required: • perl Cache::Memcached module • libmemcached • postfix with tcp_table support • /home/httpd/cgi-bin/meta/getmemcacheentry
Configuration Files:-
/etc/postfix/master.cf /etc/postfix/main.cf
Script:- /etc/postfix/memc.pl
Below are the steps that are required to make this work:
A simple perl script that allows you to handle the protocols of tcp_table named memc.pl.
an entry like this in master.cf
127.0.0.1:2552 inet n n n – 0 spawn user=nobody argv=/etc/postfix/memc.pl
Make memc.pl executable and don’t forget to reload postfix
# chmod 755 memc.pl # /etc/init.d/postfix reload
And, for example we want to use it in smtpd_recipient_restrictions as check_recipient_access in main.cf
smtpd_recipient_restrictions = … check_recipient_access tcp:[127.0.0.1]:2552, …
TESTS:-
Don’t forget to reload postfix. let’s try using postmap to query entries that we have input into memcached.
$ /var/postfix/sbin/postmap -q fuhomipu_12@rediffmail.com tcp:127.0.0.1:2552 DUNNO DO WHATEVER IT WANTS TO DO
# telnet localhost 2552
Trying 127.0.0.1… Connected to localhost.localdomain (127.0.0.1). Escape character is ‘^]’. get zbddz9a7fh850a2@rediffmail.com 200 DISCARD DELETING MAILS FROM THIS SENDER Connection closed by foreign host.
# /var/postfix/sbin/postmap -q zbddz9a7fh850a2@rediffmail.com tcp:127.0.0.1:2552 DISCARD DELETING MAILS FROM THIS SENDER
# 3 Expected Responses:-
1. DISCARD (Deleted the mail without sending the bounceback) 2. HOLD (Move the email to HOLD QUEUE) 3. DUNNO (Allow mails and check if another rule exists)
SOME LEARNING:-
NAME tcp_table – Postfix client/server table lookup protocol
PROTOCOL DESCRIPTION The TCP map class implements a very simple protocol: the client sends a request, and the server sends one reply. Requests and replies are sent as one line of ASCII text, terminated by the ASCII newline character. Request and reply parameters (see below) are separated by whitespace.
Send and receive operations must complete in 100 seconds.
REQUEST FORMAT Each request specifies a command, a lookup key, and possi- bly a lookup result.
get SPACE key NEWLINE Look up data under the specified key.
put SPACE key SPACE value NEWLINE This request is currently not implemented.
REPLY FORMAT Each reply specifies a status code and text. Replies must be no longer than 4096 characters including the newline terminator.
500 SPACE text NEWLINE In case of a lookup request, the requested data does not exist. In case of an update request, the request was rejected. The text describes the nature of the problem.
400 SPACE text NEWLINE This indicates an error condition. The text describes the nature of the problem. The client should retry the request later.
200 SPACE text NEWLINE The request was successful. In the case of a lookup request, the text contains an encoded version of the requested data.
my $memd = new Cache::Memcached { ’servers’ => [ '127.0.0.1:11211' ], };
my $email;
sub usage { print “Unknown option: @_\n” if ( @_ ); print “Usage: memc.pl abc\@domain.com \n”; exit; }
sub trim($) { my $string = shift; $string =~ s/^\s+//; $string =~ s/\s+$//; return $string; }
sub qrymemc { return unless /^get\s+(.+)/i; my $email = lc($1); chomp($email); trim($email); my $cmd = “getmemcacheentry xx.xx.xx.xx,xx.xx.xx.xx 11211 emailid:”.$email; my $ret = `$cmd`; return $ret; # # my $val = $memd->get(”fbl:”.$email); # if (defined $val) { # return ($email,$val); # } # return; }
my $val;
while (<>) { chomp; if (/^get\s+(.+)/i) { $email = lc($1); }
if(defined($email) && $email ne ”) { #print “You have entered $email\n”; chomp($email); trim($email); #print(”\nDBG:email:”.$email.”\n”); #$val = $memd->get(’fbl:’.$email); #print(”DBG:val:”.$val.”\n”); my $val = qrymemc($email); #print(”DBG:val:”.$val.”\n”);
#print(”You have entered $email \n”); #if (@res) { # chomp(@res); # $val = $res[1]; # next; #}
if(!defined($val)) { print(”200 DUNNO SENDER KEY NOT FOUND\n”); exit; } chomp($val); trim($val);
if (lc($val) eq “perm”) { print(”200 DISCARD DELETING MAILS FROM THIS SENDER\n”); exit; } elsif (lc($val) eq “temp”) { print(”200 HOLD HOLDING the mail from this SENDER\n”); exit; } else { print(”200 DUNNO DO WHATEVER IT WANTS TO DO$val\n”); exit; } } else { usage(); } }
That’s It !!!!!
I am looking for redirecting my host headers in case my origin servers do not respond. Say if my backend A does not respond and i need to redirect the URL to the origin URL so that atleast the request is served even in case of failures. The fallback would be my redirected HOST HEADER which could be on a separate machines working as an fallback.
For e.g. If http://anand.com do not respond then the request should automatically redirect to http://abc.com which are capable of serving data.
Resolution:
If your backends do not respond, you will end up in vcl_error, but you might end up in vcl_error for other reasons than just the backend responding, such as if you call error in other parts of your VCL, you reach the maximum number of restarts. If that’s an acceptable trade-off for you, I would suggest the following VCL:
sub vcl_error { if (req.http.host ~ “^a\.”) { set obj.http.Location = “http://” regsub(req.http.host, “anand.com”, “abc.com”) req.url set obj.status = 302; return(deliver); } }
Is uptime the best measure of system efficiency?
In effort to find out the answer i came across a good document by Mr Ray walker from linux journal docs and putting it as it is.
Understanding work-load averages as opposed to CPU usage.
Many Linux administrators and support technicians regularly use the top utility for real-time monitoring of their system state. In some shops, it is very typical to check top first when there is any sign of trouble. In that case, top becomes the de facto critical measurement of the machine’s health. If top looks good, there must not be any system problems. top is rich with information—memory usage, kernel states, process priorities, process owner and so forth all can be obtained from top. But, what is the purpose of those three curious load averages, and what exactly are they trying to tell me? To answer those questions, an intuitive as well as a detailed understanding of how the values are formed are necessary. Let’s start with intuition.
The Intuitive Interpretation
The three load-average values in the first line of top output are the 1-minute, 5-minute and 15-minute average. (These values also are displayed by other commands, such as uptime, not only top.) That means, reading from left to right, one can examine the aging trend and/or duration of the particular system state. The state in question is CPU load—not to be confused with CPU percentage. In fact, it is precisely the CPU load that is measured, because load averages do not include any processes or threads waiting on I/O, networking, databases or anything else not demanding the CPU. It narrowly focuses on what is actively demanding CPU time. This differs greatly from the CPU percentage. The CPU percentage is the amount of a time interval (that is, the sampling interval) that the system’s processes were found to be active on the CPU. If top reports that your program is taking 45% CPU, 45% of the samples taken by top found your process active on the CPU. The rest of the time your application was in a wait. (It is important to remember that a CPU is a discrete state machine. It really can be at only 100%, executing an instruction, or at 0%, waiting for something to do. There is no such thing as using 45% of a CPU. The CPU percentage is a function of time.) However, it is likely that your application’s rest periods include waiting to be dispatched on a CPU and not on external devices. That part of the wait percentage is then very relevant to understanding your overall CPU usage pattern.
The load averages differ from CPU percentage in two significant ways: 1) load averages measure the trend in CPU utilization not only an instantaneous snapshot, as does percentage, and 2) load averages include all demand for the CPU not only how much was active at the time of measurement.
Authors tend to overuse analogies and sometimes run the risk of either insulting the reader’s intelligence or oversimplifying the topic to the point of losing important details. However, freeway traffic patterns are a perfect analogy for this topic, because this model encapsulates the essence of resource contention and is also the chosen metaphor by many authors of queuing theory books. Not surprisingly, CPU contention is a queuing theory problem, and the concepts of arrival rates, Poisson theory and service rates all apply. A four-processor machine can be visualized as a four-lane freeway. Each lane provides the path on which instructions can execute. A vehicle can represent those instructions. Additionally, there are vehicles on the entrance lanes ready to travel down the freeway, and the four lanes either are ready to accommodate that demand or they’re not. If all freeway lanes are jammed, the cars entering have to wait for an opening. If we now apply the CPU percentage and CPU load-average measurements to this situation, percentage examines the relative amount of time each vehicle was found occupying a freeway lane, which inherently ignores the pent-up demand for the freeway—that is, the cars lined up on the entrances. So, for example, vehicle license XYZ 123 was found on the freeway 30% of the sampling time. Vehicle license ABC 987 was found on the freeway 14% of the time. That gives a picture of how each vehicle is utilizing the freeway, but it does not indicate demand for the freeway.
Moreover, the percentage of time these vehicles are found on the freeway tells us nothing about the overall traffic pattern except, perhaps, that they are taking longer to get to their destination than they would like. Thus, we probably would suspect some sort of a jam, but the CPU percentage would not tell us for sure. The load averages, on the other hand, would.
This brings us to the point. It is the overall traffic pattern of the freeway itself that gives us the best picture of the traffic situation, not merely how often cars are found occupying lanes. The load average gives us that view because it includes the cars that are queuing up to get on the freeway. It could be the case that it is a nonrush-hour time of day, and there is little demand for the freeway, but there just happens to be a lot of cars on the road. The CPU percentage shows us how much the cars are using the freeway, but the load averages show us the whole picture, including pent-up demand. Even more interesting, the more recent that pent-up demand is, the more the load-average value reflects it.
Taking the discussion back to the machinery at hand, the load averages tell us by increasing duration whether our physical CPUs are over- or under-utilized. The point of perfect utilization, meaning that the CPUs are always busy and, yet, no process ever waits for one, is the average matching the number of CPUs. If there are four CPUs on a machine and the reported one-minute load average is 4.00, the machine has been utilizing its processors perfectly for the last 60 seconds. This understanding can be extrapolated to the 5- and 15-minute averages.
In general, the intuitive idea of load averages is the higher they rise above the number of processors, the more demand there is for the CPUs, and the lower they fall below the number of processors, the more untapped CPU capacity there is. But all is not as it appears.
The Wizard behind the Curtain The load-average calculation is best thought of as a moving average of processes in Linux’s run queue marked running or uninterruptible. The words “thought of” were chosen for a reason: that is how the measurements are meant to be interpreted, but not exactly what happens behind the curtain. It is at this juncture in our journey when the reality of it all, like quantum mechanics, seems not to fit the intuitive way as it presents itself.
The load averages that the top and uptime commands display are obtained directly from /proc. If you are running Linux kernel 2.4 or later, you can read those values yourself with the command cat /proc/loadavg. However, it is the Linux kernel that produces those values in /proc. Specifically, timer.c and sched.h work together to do the computation. To understand what timer.c does for a living, the concept of time slicing and the jiffy counter help round out the picture.
In the Linux kernel, each dispatchable process is given a fixed amount of time on the CPU per dispatch. By default, this amount is 10 milliseconds, or 1/100th of a second. For that short time span, the process is assigned a physical CPU on which to run its instructions and allowed to take over that processor. More often than not, the process will give up control before the 10ms are up through socket calls, I/O calls or calls back to the kernel. (On an Intel 2.6GHz processor, 10ms is enough time for approximately 50-million instructions to occur. That’s more than enough processing time for most application cycles.) If the process uses its fully allotted CPU time of 10ms, an interrupt is raised by the hardware, and the kernel regains control from the process. The kernel then promptly penalizes the process for being such a hog. As you can see, that time slicing is an important design concept for making your system seem to run smoothly on the outside. It also is the vehicle that produces the load-average values.
The 10ms time slice is an important enough concept to warrant a name for itself: quantum value. There is not necessarily anything inherently special about 10ms, but there is about the quantum value in general, because whatever value it is set to (it is configurable, but 10ms is the default), it controls how often at a minimum the kernel takes control of the system back from the applications. One of the many chores the kernel performs when it takes back control is to increment its jiffies counter. The jiffies counter measures the number of quantum ticks that have occurred since the system was booted. When the quantum timer pops, timer.c is entered at a function in the kernel called timer.c:do_timer(). Here, all interrupts are disabled so the code is not working with moving targets. The jiffies counter is incremented by 1, and the load-average calculation is checked to see if it should be computed. In actuality, the load-average computation is not truly calculated on each quantum tick, but driven by a variable value that is based on the HZ frequency setting and tested on each quantum tick. (HZ is not to be confused with the processor’s MHz rating. This variable sets the pulse rate of particular Linux kernel activity and 1HZ equals one quantum or 10ms by default.) Although the HZ value can be configured in some versions of the kernel, it is normally set to 100. The calculation code uses the HZ value to determine the calculation frequency. Specifically, the timer.c:calc_load() function will run the averaging algorithm every 5 * HZ, or roughly every five seconds. Following is that function in its entirety:
unsigned long avenrun[3];
static inline void calc_load(unsigned long ticks) { unsigned long active_tasks; /* fixed-point */ static int count = LOAD_FREQ;
count -= ticks; if (count < 0) { count += LOAD_FREQ; active_tasks = count_active_tasks(); CALC_LOAD(avenrun[0], EXP_1, active_tasks); CALC_LOAD(avenrun[1], EXP_5, active_tasks); CALC_LOAD(avenrun[2], EXP_15, active_tasks); } }
The avenrun array contains the three averages we have been discussing. The calc_load() function is called by update_times(), also found in timer.c, and is the code responsible for supplying the calc_load() function with the ticks parameter. Unfortunately, this function does not reveal its most interesting aspect: the computation itself. However, that can be located easily in sched.h, a header used by much of the kernel code. In there, the CALC_LOAD macro and its associated values are available:
extern unsigned long avenrun[]; /* Load averages */
#define FSHIFT 11 /* nr of bits of precision */ #define FIXED_1 (1< #define LOAD_FREQ (5*HZ) /* 5 sec intervals */ #define EXP_1 1884 /* 1/exp(5sec/1min) as fixed-point */ #define EXP_5 2014 /* 1/exp(5sec/5min) */ #define EXP_15 2037 /* 1/exp(5sec/15min) */
#define CALC_LOAD(load,exp,n) \ load *= exp; \ load += n*(FIXED_1-exp); \ load >>= FSHIFT;
Here is where the tires meet the pavement. It should now be evident that reality does not appear to match the illusion. At least, this is certainly not the type of averaging most of us are taught in grade school. But it is an average nonetheless. Technically, it is an exponential decay function and is the moving average of choice for most UNIX systems as well as Linux. Let’s examine its details.
The macro takes in three parameters: the load-average bucket (one of the three elements in avenrun[]), a constant exponent and the number of running/uninterruptible processes currently on the run queue. The possible exponent constants are listed above: EXP_1 for the 1-minute average, EXP_5 for the 5-minute average and EXP_15 for the 15-minute average. The important point to notice is that the value decreases with age. The constants are magic numbers that are calculated by the mathematical function shown below:
When x=1, then y=1884; when x=5, then y=2014; and when x=15, then y=2037. The purpose of the magical numbers is that it allows the CALC_LOAD macro to use precision fixed-point representation of fractions. The magic numbers are then nothing more than multipliers used against the running load average to make it a moving average. (The mathematics of fixed-point representation are beyond the scope of this article, so I will not attempt an explanation.) The purpose of the exponential decay function is that it not only smooths the dips and spikes by maintaining a useful trend line, but it accurately decreases the quality of what it measures as activity ages. As time moves forward, successive CPU events increase their significance on the load average. This is what we want, because more recent CPU activity probably has more of an impact on the current state than ancient events. In the end, the load averages give a smooth trend from 15 minutes through the current minute and give us a window into not only the CPU usage but also the average demand for the CPUs. As the load average goes above the number of physical CPUs, the more the CPU is being used and the more demand there is for it. And, as it recedes, the less of a demand there is. With this understanding, the load average can be used with the CPU percentage to obtain a more accurate view of CPU activity.
It is my hope that this serves not only as a practical interpretation of Linux’s load averages but also illuminates some of the dark mathematical shadows behind them. For more information, a study of the exponential decay function and its applications would shed more light on the subject. But for the more practical-minded, plotting the load average vs. a controlled number of processes (that is, modeling the effects of the CALC_LOAD algorithm in a controlled loop) would give you a feel for the actual relationship and how the decaying filter applies.
Ray Walker is a consultant specializing in UNIX kernel-level code. He has been a software developer for more than 25 years, working with Linux since 1995. He can be contacted at ray.rwalk2730@gmail.com.
Varnish Installation:
Varnish is designed to be a very fast caching reverse proxy server. Read ArchitectNotes. Varnish has the advantage of being designed specifically for use as an HTTP accelerator (reverse proxy). It stores much of its cached data in memory, creating fewer disk files and fewer accesses to the filesystem than the larger, more multi-purpose Squid package. Like Squid, it serves often-requested pages to anonymous-IP users from cache instead of requesting them from the origin web server.
Varnish Architecture
Varnish Structure
Get Going with Installation……….
Prerequisite:
Handle PCRE ERRORS and libpcre not found errors:
For me it is on /usr/lib/pkgconfig/libpcre.pc
prefix=/usr exec_prefix=/usr libdir=/usr/lib includedir=/usr/include/pcre
Name: libpcre Description: PCRE – Perl compatible regular expressions C library Version: 6.6 Libs: -L${libdir} -lpcre Cflags: -I${includedir}
using Source Installation method
cd /varnish-2.1.2 ./configure –prefix=/usr/local/varnish make make install
BASIC /etc/sysconfig/varnish
NFILES=131072 MEMLOCK=82000 VARNISH_VCL_CONF=/etc/varnish/default.vcl VARNISH_ADMIN_LISTEN_ADDRESS=127.0.0.1 VARNISH_ADMIN_LISTEN_PORT=6082 VARNISH_MIN_THREADS=200 VARNISH_MAX_THREADS=2000 VARNISH_THREAD_TIMEOUT=120 VARNISH_STORAGE_FILE=/var/lib/varnish/varnish_storage.bin VARNISH_STORAGE_SIZE=50% VARNISH_STORAGE=”file,${VARNISH_STORAGE_FILE},${VARNISH_STORAGE_SIZE}” VARNISH_TTL=120
# DAEMON_OPTS is used by the init script. If you add or remove options, make # sure you update this section, too. # -h classic,500009 \ DAEMON_OPTS=”-f ${VARNISH_VCL_CONF} \ -T ${VARNISH_ADMIN_LISTEN_ADDRESS}:${VARNISH_ADMIN_LISTEN_PORT} \ -t ${VARNISH_TTL} \ -w ${VARNISH_MIN_THREADS},${VARNISH_MAX_THREADS},${VARNISH_THREAD_TIMEOUT} \ -u varnish -g varnish \ -s ${VARNISH_STORAGE} \ -p thread_pool_min=200 \ -p thread_pool_max=2000 \ -p thread_pools=8 \ -p listen_depth=4096 \ -p session_linger=50/100/150 \ -p lru_interval=60″
FILE BASED STORAGE: There are 2 main ways Varnish caches your data:
1. to memory (with the malloc storage config) 2. to disk (with the file storage config)
Sample VCL:
sub vcl_recv {
if (req.request != “GET” && req.request != “HEAD”) { pipe; }
if (req.http.Expect) { pipe; }
if (req.http.Authenticate) { pass; }
//ignore and pass no-cache documents if (req.http.Cache-Control ~ “no-cache”) { pass; } /* lookup objects from cache even when cookies are present */ lookup; }
sub vcl_pipe { pipe; }
sub vcl_pass { pass; }
sub vcl_hash {
/* * The hash subroutine is important * It defines the “name” pattern of * cache objects * We will use url+host by default. * In this way, different machines may * share the same cache object. */ set req.hash += req.url;
set req.hash += req.http.host;
/* * You can also append cookie * to the hash. In this way, * varnish will fetch/insert a cache object for each * machine. */ //set req.hash += req.http.cookie; hash; }
sub vcl_hit { if (!obj.cacheable) { pass; } deliver; }
sub vcl_miss { fetch; }
sub vcl_fetch { if (!obj.valid) { error; } if (!obj.cacheable) { pass; } /* ignore set-cookie response */ if(obj.http.Set-Cookie){ pass; }
/* ignores no-cache documents */ if(obj.http.Pragma ~ “no-cache” || obj.http.Cache-Control ~ “no-cache” || obj.http.Cache-Control ~ “private”){
pass;
}
/* insert documents to cache even when cookies are present */ insert;
sub vcl_deliver {
deliver;
sub vcl_timeout { discard; }
sub vcl_discard { discard; }
Binaries:
Exec: /usr/local/varnish/sbin/varnishd LOG: /usr/local/varnish/bin/varnishlog Others: /usr/local/varnish/sbin/
Start-up FILE:
### BEGIN INIT INFO # Provides: varnish # Required-Start: $network $local_fs $remote_fs # Required-Stop: $network $local_fs $remote_fs # Should-Start: $syslog # Short-Description: start and stop varnishd # Description: Varnish is a high-perfomance HTTP accelerator ### END INIT INFO
# Source function library. . /etc/init.d/functions
retval=0 pidfile=/var/run/varnish.pid
exec=”/usr/sbin/varnishd” prog=”varnishd” config=”/etc/sysconfig/varnish” lockfile=”/var/lock/subsys/varnish”
# Include varnish defaults [ -e /etc/sysconfig/varnish ] && . /etc/sysconfig/varnish
start() {
if [ ! -x $exec ] then echo $exec not found exit 5 fi
if [ ! -f $config ] then echo $config not found exit 6 fi echo -n “Starting varnish HTTP accelerator: ”
# Open files (usually 1024, which is way too small for varnish) ulimit -n ${NFILES:-131072}
# Varnish wants to lock shared memory log in memory. ulimit -l ${MEMLOCK:-82000}
# $DAEMON_OPTS is set in /etc/sysconfig/varnish. At least, one # has to set up a backend, or /tmp will be used, which is a bad idea. if [ "$DAEMON_OPTS" = "" ]; then echo “\$DAEMON_OPTS empty.” echo -n “Please put configuration options in $config” return 6 else # Varnish always gives output on STDOUT daemon $exec -P $pidfile “$DAEMON_OPTS” > /dev/null 2>&1 retval=$? if [ $retval -eq 0 ] then touch $lockfile echo_success echo else echo_failure fi return $retval fi }
stop() { echo -n “Stopping varnish HTTP accelerator: ” killproc $prog retval=$? echo [ $retval -eq 0 ] && rm -f $lockfile return $retval }
restart() { stop start }
reload() { restart }
force_reload() { restart }
rh_status() { status $prog }
rh_status_q() { rh_status >/dev/null 2>&1 }
# See how we were called. case “$1″ in start) rh_status_q && exit 0 $1 ;; stop) rh_status_q || exit 0 $1 ;; restart) $1 ;; reload) rh_status_q || exit 7 $1 ;; force-reload) force_reload ;; status) rh_status ;; condrestart|try-restart) rh_status_q || exit 0 restart ;; *) echo “Usage: $0 {start|stop|status|restart|condrestart|try-restart|reload|force-reload}”
exit 2 esac
exit $?
This is it….You are ready to use of the best HTTP accelerator available in industry today and many thanks to PHK for it.
References:
http://phk.freebsd.dk/pubs/varnish_vcl.pdf http://varnish.projects.linpro.no/wiki/VCLExamples http://varnish.projects.linpro.no/wiki/VCL
1. Run Varnish on a 64 bit operating system
Varnish works on 32-bit, but was designed for 64bit. It’s all about virtual memory: Things like stack size suddenly matter on 32bit. If you must use Varnish on 32-bit, you’re somewhat on your own. However, try to fit it within 2GB. I wouldn’t recommend a cache larger than 1GB, and no more than a few hundred threads… (Why are you on 32bit again?)
2. Watch /var/log/syslog
Varnish is flexible, and has a relatively robust architecture. If a Varnish worker thread was to do something Bad and Varnish noticed, an assert would be triggered, Varnish would shut down and the management process would start it up again almost instantly. This is logged. If it wasn’t, there’s a decent chance you wouldn’t notice, since the downtime is often sub-second. However, your cache is emptied. We’ve had several customers contact us about performance-issues, only to realize they’re essentially restarting Varnish several times per minute.
This might make it sound like Varnish is unstable: It’s not. But there are bugs, and I happen to see a lot of them, since that’s my job.
3. Threads
The default values for threads is based on a philosophy I’ve since come to realize isn’t optimal. The idea was to minimize the memory footprint of Varnish. So by default, Varnish uses 5 threads per thread pool. By default, that’s 10 threads minimum. The maximum is far higher, but in reality, threads are fairly cheap. If you expect to handle 500 concurrent requests, tune Varnish for that.
A little clarification on the thread-parameters: thread_pool_min is the minimum number of threads for each thread pool. thread_pool_max is the maximum total number of threads. That means the values are not on the same scale. The thread_pools parameter can safely be ignored (tests have indicated that it doesn’t matter as much as we thought), but ideally having one thread_pool for each cpu core is the rule of thumb, if you want to modify it.
You also do not want more than 5000 as the thread_pool_max. It’s dangerous, though fixed in trunk. It’s also more often than not an indication that something else is wrong. If you find yourself using 5000 threads, the solution is to find out why it’s happening, not to increase the number of threads.
To reduce the startup time, you also want to reduce the thread_pool_add_delay parameter. ’2′ is a good value (as opposed to 20 which makes for a slow start).
4. Tune based on necessity
I often look at sites where someone has tried to tune Varnish to get the most out of it, but taken it a bit too far. After working with Varnish I’ve realized that you do not really need to tune Varnish much: The defaults are tuned. The only real exception I’ve found to this is number of threads and possibly work spaces.
Varnish is by default tuned for high performance on the vast majority of real-life production sites. And it scales well, in most directions. By default. Do yourself a favor and don’t fix a problem which isn’t there. Of all the issues I’ve dealt with on Varnish, the vast majority have been related to finding out the real problem and either using Varnish to work around it, or fix it on the related system. Off the top of my head, I can really only remember one or two cases where Varnish itself has been the problem with regards to performance.
To be more specific:
* Do not modify lru_interval. I often see the value 3600?. Which is a 180 000% (one hundred and eighty thousand percent) increase from the default. This is downright dangerous if you suddenly need the lru-list, and so far my tests haven’t been able to prove any noticeable performance improvement. * Setting sess_timeout to a higher value increase your filedescriptor consumption. There’s little to gain by doing it too. You risk running out of file descriptors. At least until we can get the fix into a released version.
So the rule of thumb is: Adjust your threads, then leave the rest until you see a reason to change it. 5. Pay attention to work spaces
To avoid locking, Varnish allocates a chump of memory to each thread, session and object. While keeping the object workspace small is a good thing to reduce the memory footprint (this has been improved vastly in trunk), sometimes the session workspace is a bit too small, specially when ESI is in use. The default sess_workspace is 16kB, but I know we have customers running with 5MB sess_workspace without trouble. We’re obviously looking to fix this, but so far it seems that having some extra sess_workspace isn’t that bad. The way to tell is by asserts (unfortunately), typically something related to (p != NULL) Condition not true (though there can obviously be other reasons for that). Look for it in our bug report, then try to increase the session workspace.
6. Keep your VCL simple
Most of your VCL-work should be focused around vcl_recv and vcl_fetch. That’s where you define the majority of your caching policies. If that’s where you do your work, you’re fairly safe.
If you want to add extra headers, do it in vcl_deliver. Adding a header in vcl_hit is not safe. You can use the obj.hits variable in vcl_deliver to determine if it was a cache hit or not.
You should also review the default vcl, and if you can, let Varnish fall through to it. When you define your VCL, Varnish appends the default VCL, but if you terminate a function, the default is never run. This is an important detail in vcl_recv, where requests with cookies or Authroization-headers are passed if present. That’s far safer than forcing a lookup. The default vcl_recv code also ensures that only GET and HEAD-requests go through the cache.
7. Choosing storage backend (malloc or file?)
If you can contain your cache in memory, use malloc. If you have 32GB of physical memory, using -smalloc,30G is a good choice. The size you specify is for the cache, and does not include session workspace and such, that’s why you don’t want to specify -smalloc,32G on a 32GB-system.
If you can not contain your cache in memory, first consider if you really need that big of a cache. Then consider buying more memory. Then sleep on it. Then, if you still think you need to use disk, use -sfile. On Linux, -sfile performs far better than -smalloc once you start hitting disk. We’re talking pie-chart-material. You should also make sure the filesystem is mounted with noatime, though it shouldn’t be necessary. On Linux, my cold-hit tests (a cold hit being a cache hit that has to be read from disk, as opposed to a hot hit which is read from memory) take about 6000 seconds to run on -smalloc, while it takes 4000 seconds on -sfile with the same hardware. Consistently. However, your milage may vary with things such as kernel version, so test both anyway. My tests are easy enough: Run httperf through x-thousand urls in order. Then do it again in the same order.
8. Use packages and supplied scripts
While it may seem easier to just write your own script and/or install from source, it rarely pays off in the long run. Varnish usually run on machines where downtime has to be planned, and you don’t want a surprise when you upgrade it. Nor do you want to risk missing that little bug we realized was a problem on your distro but not others. If you do insist on running home-brew, make sure you at least get the ulimit-commands from the startup scripts.
continue ……
Varnish is made for 64-bit systems. Don’t run it on 32-bit systems if possible.
Varnish wants to memory map the entire cache. This means the entire cache needs to be able to fit into virtual memory. On a 64-bit system, VM is virtually unlimited. On a 32-bit system, processes usually have access to a maximum of 3GB of virtual memory. Since you also need to allocate stack space and other standard process requirements, in practice people don’t recommend more than 2GB of cache space for Varnish on 32-bit systems. Pretty small for a web content cache. If you want Varnish to use an entire disk for a cache, it must run on a 64-bit system.
I am trying ot find out as what could be the worst on 32 bit when running varnish. Lets c will update if i find something.
a small shell attempt to automatically load varnish vcl.
FILE=”/etc/varnish/varnish.vcl”
# Hostname and management port # (defined in /etc/default/varnish or on startup) HOSTPORT=”localhost:6082″ NOW=`date +%d%h%Y%H%M`
error() { echo 1>&2 “Failed to reload $FILE.” exit 1 }
varnishadm -T $HOSTPORT vcl.load reload$NOW $FILE || error varnishadm -T $HOSTPORT vcl.use reload$NOW || error echo Current configs: varnishadm -T $HOSTPORT vcl.list
Was looking for delete directory function in PHP which could delete directories and its data..finally landed with some options avilable on Internet shared by friends and some innovation.
Here is the output :
function delete_directory($homepath) { if (is_dir($homepath)) $dir_handle = opendir($homepath); if (!$dir_handle) return false; while($file = readdir($dir_handle)) { if ($file != “.” && $file != “..”) { if (!is_dir($homepath.”/”.$file)) unlink($homepath.”/”.$file); else delete_directory($homepath.’/’.$file); } } closedir($dir_handle); rmdir($homepath); return true; }