www.digitalmars.com         C & C++   DMDScript  

c++.chat - interesting spam trap

reply roland <--nancyetroland free.fr> writes:
http://www.unclebobsuncle.com/antispam.html

roland :-)
Jun 02 2003
next sibling parent "Greg Peet" <admin gregpeet.com> writes:
Wow thanks for bringing that to our attention. I can't wait to put that on
my site. Quite funny too.

"roland" <--nancyetroland free.fr> wrote in message
news:bbgc75$2339$1 digitaldaemon.com...
 http://www.unclebobsuncle.com/antispam.html

 roland :-)

Jun 02 2003
prev sibling parent reply Jan Knepper <jan smartsoft.us> writes:
Interesting indeed, but it does not work. Besides most of the
statements on the page have no ground.

First of all, any decent spider or crawler would keep track of
URL's it has processed. I mean think about it, every decent
website probable has circular references in the form of x.html
-> y.html -> z.html -> x.html. I know for a fact that quite a
few of my sites have many of these. Obviously this is something
anyone developing a spider or crawler, which I have done ;-),
will run into. So the idea is cute, but I don't think it really
works.

Second, quite a bit of the page is generated through JavaScript.
Many spiders or crawlers do NOT run JavaScript. I know for a
fact that JavaScript is a serious challenge for many of the
search engines on the internet.

Third, some, more advanced spiders or crawlers do not just look
at mailto: tags, but recorgnize a ' ' and check the prefix and
suffix. Run the complete string through an email syntax checker,
to make sure the address only contains legal email address
characters and such and actually ends with an existing Top Level
Domain (TLD) such as .com, .net. .com, etc and later match check
the domain through DNS and/or Whois.

Fourth, the invalid email addresses have no effect on spammers.
They will burn some more bandwidth, but as they usually use
non-existent From: and Return-Path: in their messages anyone,
but not the spammer will receive the bounces.

Fifth, if the spammer would actually have some form of decency
and bulk mail to a list and honor a removal mechanism the
mechanism usually is intelligent enough to keep track of
bounces, probe them and next remove them from the list
automagically. Check here for instance http://www.ezmlm.org/
which works with MySQL http://www.mysql.com/ through which it is
rather easy to maintain a database with millions of email
addresses.

To actually *fight* SPAM what would make sence is report SPAM
ASAP at http://www.spamcop.net/ as that results into more than
just reporting. One of the great features is that once a lot
people start reporting a certain SPAM spamcop will at the
originating IP address to bl.spamcop.net which can be used by
email receiving servers (SMTP servers) to block incoming email
if it comes from one of the many blocked IP addresses.
Unfortunately, most people just seem to delete SPAM and most
email providers do not seem to use bl.spamcop.net for email
blocking.

Of course, not publishing you email address ANYWHERE on the
internet would help the most! ;-) However, I have noticed that
quite a few company's that collect email addresses with online
sales or other forms of subscription also sell those email
addresses to others...

Just my 2 cents... Oh, in case there is any doubt... ;-) I have
written a couple of crawlers and actually also crawlers that do
handle JavaScripts very well. I have been hosting Internet
services for 3 years. I do report almost all spam at
http://www.spamcop.net/ and yes, the mail servers here do check
bl.spamcop.net (and a few others) before they actually receive
the email, well that is if the domain owners want it. Check
http://www.digitaldaemon.com/Internet%20Services/rblsmtpd.shtml
for some statistics on SPAM blocking...
Recently I patched the SMTP server again so it does block all
non-existent email adresses on local domains.



roland wrote:

 http://www.unclebobsuncle.com/antispam.html

 roland :-)

-- ManiaC++ Jan Knepper
Jun 02 2003
next sibling parent reply "KarL" <someone somewhere.org> writes:
And are you run sendmail or qmail or postfix?

"Jan Knepper" <jan smartsoft.us> wrote in message
news:3EDBFCDB.9D953019 smartsoft.us...

 Just my 2 cents... Oh, in case there is any doubt... ;-) I have
 written a couple of crawlers and actually also crawlers that do
 handle JavaScripts very well. I have been hosting Internet
 services for 3 years. I do report almost all spam at
 http://www.spamcop.net/ and yes, the mail servers here do check
 bl.spamcop.net (and a few others) before they actually receive
 the email, well that is if the domain owners want it. Check
 http://www.digitaldaemon.com/Internet%20Services/rblsmtpd.shtml
 for some statistics on SPAM blocking...
 Recently I patched the SMTP server again so it does block all
 non-existent email adresses on local domains.

Jun 02 2003
parent Jan Knepper <jan smartsoft.us> writes:
Definitely not sendmail...
Patched qmail...



KarL wrote:

 And are you run sendmail or qmail or postfix?

 "Jan Knepper" <jan smartsoft.us> wrote in message
 news:3EDBFCDB.9D953019 smartsoft.us...

 Just my 2 cents... Oh, in case there is any doubt... ;-) I have
 written a couple of crawlers and actually also crawlers that do
 handle JavaScripts very well. I have been hosting Internet
 services for 3 years. I do report almost all spam at
 http://www.spamcop.net/ and yes, the mail servers here do check
 bl.spamcop.net (and a few others) before they actually receive
 the email, well that is if the domain owners want it. Check
 http://www.digitaldaemon.com/Internet%20Services/rblsmtpd.shtml
 for some statistics on SPAM blocking...
 Recently I patched the SMTP server again so it does block all
 non-existent email adresses on local domains.


Jun 03 2003
prev sibling next sibling parent reply "Walter" <walter digitalmars.com> writes:
"Jan Knepper" <jan smartsoft.us> wrote in message
news:3EDBFCDB.9D953019 smartsoft.us...
 Just my 2 cents... Oh, in case there is any doubt... ;-) I have
 written a couple of crawlers and actually also crawlers that do
 handle JavaScripts very well. I have been hosting Internet
 services for 3 years. I do report almost all spam at
 http://www.spamcop.net/ and yes, the mail servers here do check
 bl.spamcop.net (and a few others) before they actually receive
 the email, well that is if the domain owners want it. Check
 http://www.digitaldaemon.com/Internet%20Services/rblsmtpd.shtml
 for some statistics on SPAM blocking...

I use a javascript generated mailto: on the digitalmars web pages. Are the javascript aware scrapers able to figure those out?
Jun 02 2003
parent reply Jan Knepper <jan smartsoft.us> writes:
Walter wrote:

 "Jan Knepper" <jan smartsoft.us> wrote in message
 news:3EDBFCDB.9D953019 smartsoft.us...
 Just my 2 cents... Oh, in case there is any doubt... ;-) I have
 written a couple of crawlers and actually also crawlers that do
 handle JavaScripts very well. I have been hosting Internet
 services for 3 years. I do report almost all spam at
 http://www.spamcop.net/ and yes, the mail servers here do check
 bl.spamcop.net (and a few others) before they actually receive
 the email, well that is if the domain owners want it. Check
 http://www.digitaldaemon.com/Internet%20Services/rblsmtpd.shtml
 for some statistics on SPAM blocking...

I use a javascript generated mailto: on the digitalmars web pages. Are the javascript aware scrapers able to figure those out?

Yes! My crawler will pick those up with out ANY problem. Jan
Jun 03 2003
parent reply "Walter" <walter digitalmars.com> writes:
"Jan Knepper" <jan smartsoft.us> wrote in message
news:3EDC8F17.9616A80A smartsoft.us...
 Walter wrote:
 I use a javascript generated mailto: on the digitalmars web pages. Are


 javascript aware scrapers able to figure those out?


Does that mean I have to write a cgi program to do it? <g>
Jun 03 2003
parent Jan Knepper <jan smartsoft.us> writes:
Walter wrote:

 "Jan Knepper" <jan smartsoft.us> wrote in message
 news:3EDC8F17.9616A80A smartsoft.us...
 Walter wrote:
 I use a javascript generated mailto: on the digitalmars web pages. Are


 javascript aware scrapers able to figure those out?


Does that mean I have to write a cgi program to do it? <g>

No, I can provide you with that, if you want...
Jun 03 2003
prev sibling next sibling parent roland <--rv ronetech.com> writes:
hello

thanks for the interesting information

cheers

roland

Jan Knepper wrote:

 Interesting indeed, but it does not work. Besides most of the
 statements on the page have no ground.
 
 First of all, any decent spider or crawler would keep track of
 URL's it has processed. I mean think about it, every decent
 website probable has circular references in the form of x.html
 -> y.html -> z.html -> x.html. I know for a fact that quite a
 few of my sites have many of these. Obviously this is something
 anyone developing a spider or crawler, which I have done ;-),
 will run into. So the idea is cute, but I don't think it really
 works.
 
 Second, quite a bit of the page is generated through JavaScript.
 Many spiders or crawlers do NOT run JavaScript. I know for a
 fact that JavaScript is a serious challenge for many of the
 search engines on the internet.
 
 Third, some, more advanced spiders or crawlers do not just look
 at mailto: tags, but recorgnize a ' ' and check the prefix and
 suffix. Run the complete string through an email syntax checker,
 to make sure the address only contains legal email address
 characters and such and actually ends with an existing Top Level
 Domain (TLD) such as .com, .net. .com, etc and later match check
 the domain through DNS and/or Whois.
 
 Fourth, the invalid email addresses have no effect on spammers.
 They will burn some more bandwidth, but as they usually use
 non-existent From: and Return-Path: in their messages anyone,
 but not the spammer will receive the bounces.
 
 Fifth, if the spammer would actually have some form of decency
 and bulk mail to a list and honor a removal mechanism the
 mechanism usually is intelligent enough to keep track of
 bounces, probe them and next remove them from the list
 automagically. Check here for instance http://www.ezmlm.org/
 which works with MySQL http://www.mysql.com/ through which it is
 rather easy to maintain a database with millions of email
 addresses.
 
 To actually *fight* SPAM what would make sence is report SPAM
 ASAP at http://www.spamcop.net/ as that results into more than
 just reporting. One of the great features is that once a lot
 people start reporting a certain SPAM spamcop will at the
 originating IP address to bl.spamcop.net which can be used by
 email receiving servers (SMTP servers) to block incoming email
 if it comes from one of the many blocked IP addresses.
 Unfortunately, most people just seem to delete SPAM and most
 email providers do not seem to use bl.spamcop.net for email
 blocking.
 
 Of course, not publishing you email address ANYWHERE on the
 internet would help the most! ;-) However, I have noticed that
 quite a few company's that collect email addresses with online
 sales or other forms of subscription also sell those email
 addresses to others...
 
 Just my 2 cents... Oh, in case there is any doubt... ;-) I have
 written a couple of crawlers and actually also crawlers that do
 handle JavaScripts very well. I have been hosting Internet
 services for 3 years. I do report almost all spam at
 http://www.spamcop.net/ and yes, the mail servers here do check
 bl.spamcop.net (and a few others) before they actually receive
 the email, well that is if the domain owners want it. Check
 http://www.digitaldaemon.com/Internet%20Services/rblsmtpd.shtml
 for some statistics on SPAM blocking...
 Recently I patched the SMTP server again so it does block all
 non-existent email adresses on local domains.
 
 
 
 roland wrote:
 
 
http://www.unclebobsuncle.com/antispam.html

roland :-)

-- ManiaC++ Jan Knepper

Jun 03 2003
prev sibling next sibling parent reply roland <--nancyetroland free.fr> writes:
Jan Knepper wrote:
 Interesting indeed, but it does not work. Besides most of the
 statements on the page have no ground.
 
 First of all, any decent spider or crawler would keep track of
 URL's it has processed. I mean think about it, every decent
 website probable has circular references in the form of x.html
 -> y.html -> z.html -> x.html. I know for a fact that quite a
 few of my sites have many of these. Obviously this is something
 anyone developing a spider or crawler, which I have done ;-),
 will run into. So the idea is cute, but I don't think it really
 works.
 
 Second, quite a bit of the page is generated through JavaScript.
 Many spiders or crawlers do NOT run JavaScript. I know for a
 fact that JavaScript is a serious challenge for many of the
 search engines on the internet.
 
 Third, some, more advanced spiders or crawlers do not just look
 at mailto: tags, but recorgnize a ' ' and check the prefix and
 suffix. Run the complete string through an email syntax checker,
 to make sure the address only contains legal email address
 characters and such and actually ends with an existing Top Level
 Domain (TLD) such as .com, .net. .com, etc and later match check
 the domain through DNS and/or Whois.
 
 Fourth, the invalid email addresses have no effect on spammers.
 They will burn some more bandwidth, but as they usually use
 non-existent From: and Return-Path: in their messages anyone,
 but not the spammer will receive the bounces.
 
 Fifth, if the spammer would actually have some form of decency
 and bulk mail to a list and honor a removal mechanism the
 mechanism usually is intelligent enough to keep track of
 bounces, probe them and next remove them from the list
 automagically. Check here for instance http://www.ezmlm.org/
 which works with MySQL http://www.mysql.com/ through which it is
 rather easy to maintain a database with millions of email
 addresses.
 
 To actually *fight* SPAM what would make sence is report SPAM
 ASAP at http://www.spamcop.net/ as that results into more than
 just reporting. One of the great features is that once a lot
 people start reporting a certain SPAM spamcop will at the
 originating IP address to bl.spamcop.net which can be used by
 email receiving servers (SMTP servers) to block incoming email
 if it comes from one of the many blocked IP addresses.
 Unfortunately, most people just seem to delete SPAM and most
 email providers do not seem to use bl.spamcop.net for email
 blocking.
 
 Of course, not publishing you email address ANYWHERE on the
 internet would help the most! ;-) However, I have noticed that
 quite a few company's that collect email addresses with online
 sales or other forms of subscription also sell those email
 addresses to others...
 
 Just my 2 cents... Oh, in case there is any doubt... ;-) I have
 written a couple of crawlers and actually also crawlers that do
 handle JavaScripts very well. I have been hosting Internet
 services for 3 years. I do report almost all spam at
 http://www.spamcop.net/ and yes, the mail servers here do check
 bl.spamcop.net (and a few others) before they actually receive
 the email, well that is if the domain owners want it. Check
 http://www.digitaldaemon.com/Internet%20Services/rblsmtpd.shtml
 for some statistics on SPAM blocking...
 Recently I patched the SMTP server again so it does block all
 non-existent email adresses on local domains.
 
 
 
 roland wrote:
 
 
http://www.unclebobsuncle.com/antispam.html

roland :-)

-- ManiaC++ Jan Knepper

hi jan: an opinion on that ? << yep, thats the reason why i suggested a webring of spamtraps would do better and the addresses be generated from a wide list of word combination. just imagine the how many combination could be done with this set of data rule: [ a | a+b | a+b+c | a+c | ... | b+a ] + + [ domain ].[level] where: a, b ,c ..: this, that, free, sun, ram, bot, mail, fish, stick, 33, big, flower domain : big, stick, homer, biz, temp, duch, pleht level : com, biz, net, org, mil the list could be customized per each website. i dont see how the crawler could take all those words into consideration. they can remove the invalid mails when it bounce but i think the one we are discussing right now will guarantee that they will have an adequate supply for a very long time. imagine a webring of 500 sites linking one another. ciao! _________________ You have read a post from a newbie. Take everything with a grain of salt. Registered Linux User #246176 The user formerly known as ramfree17 (oh,im still ramfree17 ?!?!)


Jun 04 2003
parent reply Jan Knepper <jan smartsoft.us> writes:
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

roland wrote:

 yep, thats the reason why i suggested a webring of spamtraps would do
 better and the addresses be generated from a wide list of word
 combination. just imagine the how many combination could be done with
 this set of data

 rule: [ a | a+b | a+b+c | a+c | ... | b+a ] +   + [ domain ].[level]

 where:

 a, b ,c ..: this, that, free, sun, ram, bot, mail, fish, stick, 33, big,
 flower
 domain : big, stick, homer, biz, temp, duch, pleht
 level : com, biz, net, org, mil

 the list could be customized per each website. i dont see how the
 crawler could take all those words into consideration. they can remove
 the invalid mails when it bounce but i think the one we are discussing
 right now will guarantee that they will have an adequate supply for a
 very long time. imagine a webring of 500 sites linking one another.

500 websites (pages) in a webring would take a decent crawler no more than 2 hours to process. Believe me, they are NOT using DSL or Cable!!! Serial processing of 500 web pages 10 seconds per page (boy is that long!) is 5000 seconds, that's not more than 2 hours! Than they match what ever they found against a local DNS server with enough cache. Try: # dig mx free.fr <Enter> If you have a Unix/BSD/Linux box one line somewhere: digitaldaemon# dig mx free.fr ; <<>> DiG 8.3 <<>> mx free.fr ;; res options: init recurs defnam dnsrch ;; got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 10, AUTHORITY: 2, ADDITIONAL: 12 ;; QUERY SECTION: ;; free.fr, type = MX, class = IN ;; ANSWER SECTION: free.fr. 1D IN MX 10 mx.free.fr. free.fr. 1D IN MX 20 mrelay2-1.free.fr. free.fr. 1D IN MX 20 mrelay2-2.free.fr. free.fr. 1D IN MX 20 mx1-1.free.fr. free.fr. 1D IN MX 50 mrelay3-2.free.fr. free.fr. 1D IN MX 50 mrelay4-2.free.fr. free.fr. 1D IN MX 50 mrelay1-1.free.fr. free.fr. 1D IN MX 50 mrelay1-2.free.fr. free.fr. 1D IN MX 60 mrelay3-1.free.fr. free.fr. 1D IN MX 90 ns1.proxad.net. ;; AUTHORITY SECTION: free.fr. 1D IN NS ns0.proxad.net. free.fr. 1D IN NS ns1.proxad.net. ;; ADDITIONAL SECTION: mx.free.fr. 15M IN A 213.228.0.1 mx.free.fr. 15M IN A 213.228.0.129 mx.free.fr. 15M IN A 213.228.0.13 mx.free.fr. 15M IN A 213.228.0.131 mx.free.fr. 15M IN A 213.228.0.166 mx.free.fr. 15M IN A 213.228.0.175 mx.free.fr. 15M IN A 213.228.0.65 mrelay2-1.free.fr. 1D IN A 213.228.0.13 mrelay2-2.free.fr. 1D IN A 213.228.0.131 mx1-1.free.fr. 1D IN A 213.228.0.65 mrelay3-2.free.fr. 1D IN A 213.228.0.166 mrelay4-2.free.fr. 1D IN A 213.228.0.175 ;; Total query time: 127 msec ;; FROM: digitaldaemon.com to SERVER: default -- 63.105.9.35 ;; WHEN: Thu Jun 5 10:04:24 2003 ;; MSG SIZE sent: 25 rcvd: 502 This is done with the 'dig' progam total query time is 127 msec's!!!! Now they know whether or not the found domain actually has a MX record, or not... If not, they can just drop the address from the list. Also crawlers do *not* browse the web like we do. They just process 'text' oriented files and run several (read hundreds or thousands) of threads/processes at the same time. So, the only thing you would be able to actually make a difference with is using existing domain names. Not a good idea as owners of those domains might have a catch all and than receive the same SPAM over and over again. Soon, the providers will all change their systems so their SMTP servers only accept email to addresses that actually do exist and *deny* receipt of anything else with the usual 550 error. So, in the end, what are we actually creating with stuff like this??? Nothing, we just have crawlers/spiders consume more bandwidth to read all the pages. The crawlers' DNS matcher consume more bandwidth to check for DNS. The bulk mailer consume more bandwidth to send all the email. The internet consume more bandwidth to deal with all the bounces, double bounces, etc. Last, 500 pages with each a 1,000 email addresses is 500,000 email addresses. I hate to tell you, but that's only 1.5% of the total email addresses I have... <sigh> Would you honestly think that anyone would process the bounces for numbers like that manually??? ManiaC++ Jan Knepper
Jun 05 2003
parent reply roland <--nancyetroland free.fr> writes:
Jan Knepper wrote:
 roland wrote:
 
 yep, thats the reason why i suggested a webring of spamtraps would do
 better and the addresses be generated from a wide list of word
 combination. just imagine the how many combination could be done with
 this set of data

 rule: [ a | a+b | a+b+c | a+c | ... | b+a ] +   + [ domain ].[level]

 where:

 a, b ,c ..: this, that, free, sun, ram, bot, mail, fish, stick, 33, big,
 flower
 domain : big, stick, homer, biz, temp, duch, pleht
 level : com, biz, net, org, mil

 the list could be customized per each website. i dont see how the
 crawler could take all those words into consideration. they can remove
 the invalid mails when it bounce but i think the one we are discussing
 right now will guarantee that they will have an adequate supply for a
 very long time. imagine a webring of 500 sites linking one another.

than 2 hours to process. Believe me, they are NOT using DSL or Cable!!! Serial processing of 500 web pages 10 seconds per page (boy is that long!) is 5000 seconds, that's not more than 2 hours! Than they match what ever they found against a local DNS server with enough cache. Try: # dig mx free.fr <Enter> If you have a Unix/BSD/Linux box one line somewhere: digitaldaemon# dig mx free.fr ; <<>> DiG 8.3 <<>> mx free.fr ;; res options: init recurs defnam dnsrch ;; got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 10, AUTHORITY: 2, ADDITIONAL: 12 ;; QUERY SECTION: ;; free.fr, type = MX, class = IN ;; ANSWER SECTION: free.fr. 1D IN MX 10 mx.free.fr. free.fr. 1D IN MX 20 mrelay2-1.free.fr. free.fr. 1D IN MX 20 mrelay2-2.free.fr. free.fr. 1D IN MX 20 mx1-1.free.fr. free.fr. 1D IN MX 50 mrelay3-2.free.fr. free.fr. 1D IN MX 50 mrelay4-2.free.fr. free.fr. 1D IN MX 50 mrelay1-1.free.fr. free.fr. 1D IN MX 50 mrelay1-2.free.fr. free.fr. 1D IN MX 60 mrelay3-1.free.fr. free.fr. 1D IN MX 90 ns1.proxad.net. ;; AUTHORITY SECTION: free.fr. 1D IN NS ns0.proxad.net. free.fr. 1D IN NS ns1.proxad.net. ;; ADDITIONAL SECTION: mx.free.fr. 15M IN A 213.228.0.1 mx.free.fr. 15M IN A 213.228.0.129 mx.free.fr. 15M IN A 213.228.0.13 mx.free.fr. 15M IN A 213.228.0.131 mx.free.fr. 15M IN A 213.228.0.166 mx.free.fr. 15M IN A 213.228.0.175 mx.free.fr. 15M IN A 213.228.0.65 mrelay2-1.free.fr. 1D IN A 213.228.0.13 mrelay2-2.free.fr. 1D IN A 213.228.0.131 mx1-1.free.fr. 1D IN A 213.228.0.65 mrelay3-2.free.fr. 1D IN A 213.228.0.166 mrelay4-2.free.fr. 1D IN A 213.228.0.175 ;; Total query time: 127 msec ;; FROM: digitaldaemon.com to SERVER: default -- 63.105.9.35 ;; WHEN: Thu Jun 5 10:04:24 2003 ;; MSG SIZE sent: 25 rcvd: 502 This is done with the 'dig' progam total query time is 127 msec's!!!! Now they know whether or not the found domain actually has a MX record, or not... If not, they can just drop the address from the list. Also crawlers do *not* browse the web like we do. They just process 'text' oriented files and run several (read hundreds or thousands) of threads/processes at the same time. So, the only thing you would be able to actually make a difference with is using existing domain names. Not a good idea as owners of those domains might have a catch all and than receive the same SPAM over and over again. Soon, the providers will all change their systems so their SMTP servers only accept email to addresses that actually do exist and *deny* receipt of anything else with the usual 550 error. So, in the end, what are we actually creating with stuff like this??? Nothing, we just have crawlers/spiders consume more bandwidth to read all the pages. The crawlers' DNS matcher consume more bandwidth to check for DNS. The bulk mailer consume more bandwidth to send all the email. The internet consume more bandwidth to deal with all the bounces, double bounces, etc. Last, 500 pages with each a 1,000 email addresses is 500,000 email addresses. I hate to tell you, but that's only 1.5% of the total email addresses I have... <sigh> Would you honestly think that anyone would process the bounces for numbers like that manually??? ManiaC++ Jan Knepper

ok i'm afraid i'm consuming _your_ bandwidth .. ;-) a last question: what happen a) to the crawlers, b) to the internet, if 100000 sites have 10000 (=10e9) e-mail addresse ? roland
Jun 05 2003
next sibling parent reply Jan Knepper <jan smartsoft.us> writes:
roland wrote:

 Jan Knepper wrote:
 roland wrote:

 yep, thats the reason why i suggested a webring of spamtraps would do
 better and the addresses be generated from a wide list of word
 combination. just imagine the how many combination could be done with
 this set of data

 rule: [ a | a+b | a+b+c | a+c | ... | b+a ] +   + [ domain ].[level]

 where:

 a, b ,c ..: this, that, free, sun, ram, bot, mail, fish, stick, 33, big,
 flower
 domain : big, stick, homer, biz, temp, duch, pleht
 level : com, biz, net, org, mil

 the list could be customized per each website. i dont see how the
 crawler could take all those words into consideration. they can remove
 the invalid mails when it bounce but i think the one we are discussing
 right now will guarantee that they will have an adequate supply for a
 very long time. imagine a webring of 500 sites linking one another.

than 2 hours to process. Believe me, they are NOT using DSL or Cable!!! Serial processing of 500 web pages 10 seconds per page (boy is that long!) is 5000 seconds, that's not more than 2 hours! Than they match what ever they found against a local DNS server with enough cache. Try: # dig mx free.fr <Enter> If you have a Unix/BSD/Linux box one line somewhere: digitaldaemon# dig mx free.fr ; <<>> DiG 8.3 <<>> mx free.fr ;; res options: init recurs defnam dnsrch ;; got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 10, AUTHORITY: 2, ADDITIONAL: 12 ;; QUERY SECTION: ;; free.fr, type = MX, class = IN ;; ANSWER SECTION: free.fr. 1D IN MX 10 mx.free.fr. free.fr. 1D IN MX 20 mrelay2-1.free.fr. free.fr. 1D IN MX 20 mrelay2-2.free.fr. free.fr. 1D IN MX 20 mx1-1.free.fr. free.fr. 1D IN MX 50 mrelay3-2.free.fr. free.fr. 1D IN MX 50 mrelay4-2.free.fr. free.fr. 1D IN MX 50 mrelay1-1.free.fr. free.fr. 1D IN MX 50 mrelay1-2.free.fr. free.fr. 1D IN MX 60 mrelay3-1.free.fr. free.fr. 1D IN MX 90 ns1.proxad.net. ;; AUTHORITY SECTION: free.fr. 1D IN NS ns0.proxad.net. free.fr. 1D IN NS ns1.proxad.net. ;; ADDITIONAL SECTION: mx.free.fr. 15M IN A 213.228.0.1 mx.free.fr. 15M IN A 213.228.0.129 mx.free.fr. 15M IN A 213.228.0.13 mx.free.fr. 15M IN A 213.228.0.131 mx.free.fr. 15M IN A 213.228.0.166 mx.free.fr. 15M IN A 213.228.0.175 mx.free.fr. 15M IN A 213.228.0.65 mrelay2-1.free.fr. 1D IN A 213.228.0.13 mrelay2-2.free.fr. 1D IN A 213.228.0.131 mx1-1.free.fr. 1D IN A 213.228.0.65 mrelay3-2.free.fr. 1D IN A 213.228.0.166 mrelay4-2.free.fr. 1D IN A 213.228.0.175 ;; Total query time: 127 msec ;; FROM: digitaldaemon.com to SERVER: default -- 63.105.9.35 ;; WHEN: Thu Jun 5 10:04:24 2003 ;; MSG SIZE sent: 25 rcvd: 502 This is done with the 'dig' progam total query time is 127 msec's!!!! Now they know whether or not the found domain actually has a MX record, or not... If not, they can just drop the address from the list. Also crawlers do *not* browse the web like we do. They just process 'text' oriented files and run several (read hundreds or thousands) of threads/processes at the same time. So, the only thing you would be able to actually make a difference with is using existing domain names. Not a good idea as owners of those domains might have a catch all and than receive the same SPAM over and over again. Soon, the providers will all change their systems so their SMTP servers only accept email to addresses that actually do exist and *deny* receipt of anything else with the usual 550 error. So, in the end, what are we actually creating with stuff like this??? Nothing, we just have crawlers/spiders consume more bandwidth to read all the pages. The crawlers' DNS matcher consume more bandwidth to check for DNS. The bulk mailer consume more bandwidth to send all the email. The internet consume more bandwidth to deal with all the bounces, double bounces, etc. Last, 500 pages with each a 1,000 email addresses is 500,000 email addresses. I hate to tell you, but that's only 1.5% of the total email addresses I have... <sigh> Would you honestly think that anyone would process the bounces for numbers like that manually??? ManiaC++ Jan Knepper

ok i'm afraid i'm consuming _your_ bandwidth .. ;-)

Don't worry.
 a last question: what happen a) to the crawlers, b) to the internet, if
 100000 sites have 10000 (=10e9) e-mail addresse ?

;-) Internet Meltdown... Jan
Jun 05 2003
parent roland <--rv ronetech.com> writes:
Jan Knepper wrote:

 Internet Meltdown...
 
 Jan
 

oops 8-( roland
Jun 06 2003
prev sibling parent reply "Greg Peet" <admin gregpeet.com> writes:
"roland" wrote:
 a last question: what happen a) to the crawlers, b) to the internet, if
 100000 sites have 10000 (=10e9) e-mail addresse ?

a) Logic, b) Didn't Nostradamus say something about this...hmm...armageddon...bill gates...something around those lines i think =P
Jun 06 2003
parent roland <--rv ronetech.com> writes:
Greg Peet wrote:

 "roland" wrote:
 
a last question: what happen a) to the crawlers, b) to the internet, if
100000 sites have 10000 (=10e9) e-mail addresse ?

a) Logic, b) Didn't Nostradamus say something about this...hmm...armageddon...bill gates...something around those lines i think =P

you can buy a master degree without studying, improve sexual satisfaction, earn thousan of cash without working ... ;-) by roland
Jun 06 2003
prev sibling parent reply Scott Dale Robison <scott-news.digitalmars.com isdr.net> writes:
Jan Knepper wrote:
 To actually *fight* SPAM what would make sence is report SPAM
 ASAP at http://www.spamcop.net/ as that results into more than
 just reporting. One of the great features is that once a lot
 people start reporting a certain SPAM spamcop will at the
 originating IP address to bl.spamcop.net which can be used by
 email receiving servers (SMTP servers) to block incoming email
 if it comes from one of the many blocked IP addresses.
 Unfortunately, most people just seem to delete SPAM and most
 email providers do not seem to use bl.spamcop.net for email
 blocking.

I agree with 99.99% of what you wrote, this being the one part I (partially) disagree with. Sure, SpamCop (and other similar services) can prove valuable, but they have some serious potential downfalls. The single biggest one, IMO, is that many spam-blocking services don't care about the source of an email. If it is reported as spam, they have no obligation to confirm it. I personally know of cases where actual *documentation* of a persons opt-in was completely and utterly ignored. The person in question didn't bother trying to opt-out (note: after having opt'ed-in), they just reported the 'spam' to SpamCop and the 'offending' mail server was black-holed. Note: I realize this is just my word against their's, and I don't expect anyone to just assume I'm right. I'm just sharing a personal experience and it's worth exactly what you're paying for it. I guess the point I'm trying to make is, if you want to use SpamCop or any other similar service, feel free. Just realize that these entities are no more regulated than the spammers they claim to want to stop, and sometimes an agenda may slip through. After all, their value is in blocking email. So what if sometimes legitimate email gets blocked? No, I'm not a spammer. Just a person with opinions. :) Scott Dale Robison
Jun 07 2003
next sibling parent gf <mz_y2k yahoo...com> writes:
Scott Dale Robison <scott-news.digitalmars.com isdr.net> wrote in 
news:bbsff6$1nvl$1 digitaldaemon.com:

 I agree with 99.99% of what you wrote, this being the one part I 
 (partially) disagree with. Sure, SpamCop (and other similar services) 
 can prove valuable, but they have some serious potential downfalls. The 
 single biggest one, IMO, is that many spam-blocking services don't care 
 about the source of an email. If it is reported as spam, they have no 
 obligation to confirm it. I personally know of cases where actual 
 *documentation* of a persons opt-in was completely and utterly ignored. 
 The person in question didn't bother trying to opt-out (note: after 
 having opt'ed-in), they just reported the 'spam' to SpamCop and the 
 'offending' mail server was black-holed. Note: I realize this is just my 
 word against their's, and I don't expect anyone to just assume I'm 
 right. I'm just sharing a personal experience and it's worth exactly 
 what you're paying for it.
 
 I guess the point I'm trying to make is, if you want to use SpamCop or 
 any other similar service, feel free. Just realize that these entities 
 are no more regulated than the spammers they claim to want to stop, and 
 sometimes an agenda may slip through. After all, their value is in 
 blocking email. So what if sometimes legitimate email gets blocked?
 
 No, I'm not a spammer. Just a person with opinions. :)
 
 Scott Dale Robison

You sure fooled me! :))))) /gf
Jun 07 2003
prev sibling parent reply Jan Knepper <jan smartsoft.us> writes:
Scott Dale Robison wrote:

 Jan Knepper wrote:
 To actually *fight* SPAM what would make sence is report SPAM
 ASAP at http://www.spamcop.net/ as that results into more than
 just reporting. One of the great features is that once a lot
 people start reporting a certain SPAM spamcop will at the
 originating IP address to bl.spamcop.net which can be used by
 email receiving servers (SMTP servers) to block incoming email
 if it comes from one of the many blocked IP addresses.
 Unfortunately, most people just seem to delete SPAM and most
 email providers do not seem to use bl.spamcop.net for email
 blocking.

I agree with 99.99% of what you wrote, this being the one part I (partially) disagree with. Sure, SpamCop (and other similar services) can prove valuable, but they have some serious potential downfalls. The single biggest one, IMO, is that many spam-blocking services don't care about the source of an email. If it is reported as spam, they have no obligation to confirm it. I personally know of cases where actual *documentation* of a persons opt-in was completely and utterly ignored. The person in question didn't bother trying to opt-out (note: after having opt'ed-in), they just reported the 'spam' to SpamCop and the 'offending' mail server was black-holed. Note: I realize this is just my word against their's, and I don't expect anyone to just assume I'm right. I'm just sharing a personal experience and it's worth exactly what you're paying for it.

I know... I have experienced that as well. That is indeed one of the unfortunate sides of spamcop.net
 I guess the point I'm trying to make is, if you want to use SpamCop or
 any other similar service, feel free. Just realize that these entities
 are no more regulated than the spammers they claim to want to stop, and
 sometimes an agenda may slip through. After all, their value is in
 blocking email. So what if sometimes legitimate email gets blocked?

I *only* use spamcop for those email that are *SPAM*. Legitimate email never got blocked as spamcop only begins blocking after a certain treshold has been reached, at least I have never heard complaints about it...
 No, I'm not a spammer. Just a person with opinions. :)

I agree. I just stated that spamcop provides a service, not that it is perfect ;-) Jan
Jun 07 2003
parent reply Scott Dale Robison <scott-news.digitalmars.com isdr.net> writes:
Jan Knepper wrote:
 I *only* use spamcop for those email that are *SPAM*.
 Legitimate email never got blocked as spamcop only begins blocking after a
 certain treshold has been reached, at least I have never heard complaints
 about it...

I've never heard complaints from a user of SpamCop, to be fair. Only a person who had a newsletter stopped going to an entire domain because of SpamCop. Yet another note: I will admit that it is *possible* that the email in question was technically spam ... unfortunately, we can never know as no one would even *try* to opt out or followup the original opt in. In any case, I'm convinced that *if* the offender was guilty, it was unintentional. Also, to be fair, I was once guilty of unintentionally running an open relay, but the 'good samaritan' that caught me at it was nice enough to remove me from their open relay database once I closed it. I do recognize that most of these people are good guys ... I'm just concerned when so many people on the net don't think through the potential problems of taking someone elses word on what is or is not a spamming IP.
 I agree. I just stated that spamcop provides a service, not that it is
 perfect ;-)

Fair enough. Apologies if I was offensive in any way. :) </soapbox> SDR
Jun 07 2003
parent reply Jan Knepper <jan smartsoft.us> writes:
Scott Dale Robison wrote:

 I've never heard complaints from a user of SpamCop, to be fair. Only a
 person who had a newsletter stopped going to an entire domain because of
 SpamCop. Yet another note: I will admit that it is *possible* that the
 email in question was technically spam ... unfortunately, we can never
 know as no one would even *try* to opt out or followup the original opt
 in. In any case, I'm convinced that *if* the offender was guilty, it was
 unintentional.

Oh, I have seen those complaints MANY times. People that actually opted-in themselves and than in time get sick of SPAM, find spamcop and start reporting everything that comes into their mailbox not remembering wether or not they subscribed for it or not. Spamcop is very aware of this as well.
 Also, to be fair, I was once guilty of unintentionally running an open
 relay, but the 'good samaritan' that caught me at it was nice enough to
 remove me from their open relay database once I closed it. I do
 recognize that most of these people are good guys ... I'm just concerned
 when so many people on the net don't think through the potential
 problems of taking someone elses word on what is or is not a spamming IP.

The internet professionals are usually very tolerant and helpful, at least, that's my experience. What did you use? sendmail??? Well, that's exactly the problem with the Internet at this moment. It's like trying to drive your car on the hiway with people around you that do not have a license... <sigh>
 I agree. I just stated that spamcop provides a service, not that it is
 perfect ;-)

Fair enough. Apologies if I was offensive in any way. :)

Nag! ManiaC++ Jan Knepper
Jun 09 2003
parent Scott Dale Robison <scott-news.digitalmars.com isdr.net> writes:
Jan Knepper wrote:
 The internet professionals are usually very tolerant and helpful, at least,
 that's my experience. What did you use? sendmail???

I think I was running Xmail at the time, though I'm not 100% certain. It's been a long time... SDR
Jun 09 2003