2014-11-25 6 views
-2

Я пытаюсь сделать ботов для просмотра с использованием phantomjs, но в некоторых случаях он недостаточно прочен для использования, которое мне нужно, и когда некоторые запросы терпят неудачу, нет выбора повторить их. В тех случаях я повторяю запросы, которые потерпели неудачу, или которые могли быть неудачными, и файлы cookie в браузере в это время. Затем я беру информацию в сценарии python и делаю запросы от этого. Я собираю информацию из строки с использованием регулярного выражения, а затем продолжаю использовать pycurl для выполнения запросов. Я добавляю функцию python, которая обрабатывает строку ниже. Функция отлично работает, когда я сама использую ее на скрипте test.py, но она не работает, когда я добавляю ее в основной скрипт python, даже если интерпретатор является одним и тем же одним и тем же компьютером и папкой, почему что-то как это происходит?Regex weirdness in python :)

ФУНКЦИЯ:

def getReqs(interface_text): 
    if("<van LAST_LOAD>" in interface_text): 
     interface_text=str(interface_text[interface_text.rfind("<van LAST_LOAD>"):]) 
     cookie_req=re.findall(r"<van[^>]*?type='cookies'[^>]*?>([\s\S]*?)</van>[^<]*?<van[^>]*?type='link_taken'[^>]*?href='([^']*?)'>",interface_text) 
     topclicks=re.findall(r"<van[^>]*?type='top_request'[^>]*?href='([^']*?)'>",interface_text) 
     imgclicks=re.findall(r"<van[^>]*?type='image_request'[^>]*?href='([^']*?)'>",interface_text) 
     ind=list() 
     for d in cookie_req: 
      cooks=re.findall(r"([\S]*?)\t\t([\S]*?)\t\t([\S]*?)\t\t(\d+)",d[0]) 
      rr=dict() 
      rr['cookies']=cooks 
      rr['request']=d[1].strip() 
      type_='image'  
      for d in topclicks: 
       if(rr['request']==d.strip()): type_='toplink' 
      rr['type']=type_ 
      ind.append(rr) 
     return ind 
    else: 
     return False 

STRING:

New URL: http://domain.com/ 
Request (http://domain.com/css/style.css): 
Request (http://domain.com/tp/filter.php?pro=936): 
Request (http://domain.com/tp/a_ft.php?rand=5): 
<van LAST_LOAD> 
Processing images and getting hidden ones 
Request (http://domain.com/tp/img.php): 
Images with width set to over 85 67 
Done processing images. 
Checking Resourse Status 
Resourse retrieval status: Started/Full F http://domain.com/ 
Resourse retrieval status: Started/Full F http://domain.com/css/style.css 
Resourse retrieval status: Started/Full F http://domain.com/tp/filter.php?pro=936 
Resourse retrieval status: Started/Full F http://domain.com/tp/a_ft.php?rand=5 
Resourse retrieval status: Started/Full F http://domain.com/tp/img.php  
Phantom will exit in 33775 




    Reclicking 




    Clicking Image 
    Random Click: 5 
    <van type='image_request' href='http://www.domain.com/st/thumbs/238/YOWF8GaqIz.jpg'> 
    Dims: 204,514,240,180 
    Global mouse position 0 0 
    Moving to mouse to 635 295 
    mouse moved 
    Trying to navigate to: http://domain.com/gallery/www.html?id=437&x=8715eb135db63642cda1ec1c19e8d529&url=aHR0cDovL3d3dy5kcnR1YmVyLmNvbS92aWRlby8xOTE1MDEyL2FtYXRldXItcnVzc2lhbi1zZXgtdGFwZQ==&s=1 
    Caused by: LinkClicked 
    Will actually navigate: false 
    Sent from the page's main frame: false 
    Expected links: 5 
    <van type='cookies'> 
    domain.com  proimg  93ffe5  1417031956 
    domain.com  pro_cc3  394ef8df2b  1417031956 
    domain.com  pro_cc2  3377058  1417031956 
    domain.com  fav  1416945556  1448481556 
    domain.com  tp  MXwwfDE0MTY5NDU1NTZ8MTQxNjk0NTU1NnwwO3Rlc3QyMS5jb20=  1417031956 
    </van> 
    <van type='link_taken' href='http://domain.com/gallery/www.html?id=437&x=8715eb135db63642cda1ec1c19e8d529&url=aHR0cDovL3d3dy5kcnR1YmVyLmNvbS92aWRlby8xOTE1MDEyL2FtYXRldXItcnVzc2lhbi1zZXgtdGFwZQ==&s=1'> 




    Reclicking 




    Clicking Image 
    Random Click: 3 
    <van type='image_request' href='http://www.domain.com/st/thumbs/730/PGy0TRimJJ.jpg'> 
    Dims: 204,22,240,180 
    Global mouse position 635 295 
    Moving to mouse to 143 295 
    mouse moved 
    Trying to navigate to: http://domain.com/gallery/sss.html?id=424&x=e3ad16bcdc583a324acbc3a83f654a7a&url=aHR0cDovL3d3dy5kcnR1YmVyLmNvbS92aWRlby8xOTE2Mjk5L3RvdWNoaW5nLWJlYXV0eXMtanVpY3ktc3BvdA==&s=1 
    Caused by: LinkClicked 
    Will actually navigate: false 
    Sent from the page's main frame: false 
    Expected links: 4 
    <van type='cookies'> 
    domain.com  proimg  93ffe5  1417031956 
    domain.com  pro_cc3  394ef8df2b  1417031956 
    domain.com  pro_cc2  3377058  1417031956 
    domain.com  fav  1416945556  1448481556 
    domain.com  tp  MXwwfDE0MTY5NDU1NTZ8MTQxNjk0NTU1NnwwO3Rlc3QyMS5jb20=  1417031956 
    </van> 
    <van type='link_taken' href='http://domain.com/gallery/sss.html?id=424&x=e3ad16bcdc583a324acbc3a83f654a7a&url=aHR0cDovL3d3dy5kcnR1YmVyLmNvbS92aWRlby8xOTE2Mjk5L3RvdWNoaW5nLWJlYXV0eXMtanVpY3ktc3BvdA==&s=1'> 




    Reclicking 




    Clicking Image 
    Random Click: 7 
    <van type='image_request' href='http://www.domain.com/st/thumbs/867/uLzPrb0K45.jpg'> 
    Dims: 424,22,240,180 
    Global mouse position 143 295 
    Moving to mouse to 143 515 
    mouse moved 
    Trying to navigate to: http://domain.com/gallery/aaa.html?id=466&x=8dcbd277bf725b468c7933cc81692be0&url=aHR0cDovL3d3dy5kcnR1YmVyLmNvbS92aWRlby8xOTExMzQ0L3doaXRlLWFuZC1ibGFjay10ZWVuLWJhYmVzLW1hc3R1cmJhdGluZw==&s=1 
    Caused by: LinkClicked 
    Will actually navigate: false 
    Sent from the page's main frame: false 
    Expected links: 3 
    <van type='cookies'> 
    domain.com  proimg  93ffe5  1417031956 
    domain.com  pro_cc3  394ef8df2b  1417031956 
    domain.com  pro_cc2  3377058  1417031956 
    domain.com  fav  1416945556  1448481556 
    domain.com  tp  MXwwfDE0MTY5NDU1NTZ8MTQxNjk0NTU1NnwwO3Rlc3QyMS5jb20=  1417031956 
    </van> 
    <van type='link_taken' href='http://domain.com/gallery/aaa.html?id=466&x=8dcbd277bf725b468c7933cc81692be0&url=aHR0cDovL3d3dy5kcnR1YmVyLmNvbS92aWRlby8xOTExMzQ0L3doaXRlLWFuZC1ibGFjay10ZWVuLWJhYmVzLW1hc3R1cmJhdGluZw==&s=1'> 

Этот код с другой стороны, возвращает пустой список.

#!/usr/bin/python 
#mysql* MySQL* 
__author__ = 'root' 
import MySQLdb 
import sys 
import random 
import subprocess 
import re 
import time 
import pycurl 
import cStringIO 
import tldextract 




def mergeCookies(cookieList,cookieFile): 
    data = open(cookieFile,'r').read() 
    precooks=re.findall(ur"([\S]*?)\t([\S]*?)\t([\S]*?)\t([\S]*?)\t([\S]*?)\t([\S]*?)\t([\S]+)",data) 
    total="""# Netscape HTTP Cookie File 
# http://curl.haxx.se/rfc/cookie_spec.html 
# This file was generated by libcurl! Edit at your own risk. 

""" 
    keeper= list() 
    for old in precooks: 
     refresh=False 
     for new in cookieList: 
      print str(old[0]).strip() 
      new_parse=tldextract.extract(new[0]) 
      old_parse=tldextract.extract(old[0]) 
      if (new_parse[1].strip()==old_parse[1].strip() and str(new[1]).strip()==str(old[5]).strip() and not(str(old[0]).strip()+str(old[5]).strip() in keeper or str(new[0]).strip()+str(new[1]).strip() in keeper)): 
       total+=str(old[0]).strip()+"\t"+"TRUE"+"\t"+"/\tFALSE\t1579998218\t"+str(new[1]).strip()+"\t"+str(new[2]).strip()+"\n" 
       keeper.append(str(old[0]).strip()+str(old[5]).strip()) 
       keeper.append(str(new[0]).strip()+str(new[1]).strip()) 
       refresh=True 
     if(not refresh): 
      total+=str(old[0]).strip()+"\t"+"TRUE"+"\t"+"/\tFALSE\t1579998218\t"+str(old[5]).strip()+"\t"+str(old[6]).strip()+"\n" 
    for new in cookieList:   
     if(not(str(new[0]).strip()+str(new[1]).strip() in keeper)):    
      total+=str(new[0]).strip()+"\t"+"TRUE"+"\t"+"/\tFALSE\t1579998218\t"+str(new[1]).strip()+"\t"+str(new[2]).strip()+"\n" 
      keeper.append(str(new[0]).strip()+str(new[1]).strip()) 
    open(cookieFile,'w').write(total) 
def hitFormGetProxy(url,cookieFile,cookieList,proxy,lang,agent,referer,type_,theCol): 
    times=0 
    mergeCookies(cookieList,cookieFile) 
    while True: 
     times+=1 
     c = pycurl.Curl() 
     buff = cStringIO.StringIO() 
     c.setopt(c.URL, url) 
     c.setopt(c.WRITEFUNCTION, buff.write) 
     c.setopt(c.COOKIEFILE, cookieFile) 
     c.setopt(c.COOKIEJAR, cookieFile) 
     c.setopt(c.AUTOREFERER, True) 
     #c.setopt(c.COOKIESESSION, True) 
     #c.setopt(c.COOKIE, cookieString) 
     c.setopt(c.FAILONERROR, False) 
     c.setopt(c.FOLLOWLOCATION, True) 
     c.setopt(c.VERBOSE, True) 
     c.setopt(c.PROXY, proxy) 
     c.setopt(c.CONNECTTIMEOUT, 10) 
     c.setopt(c.TIMEOUT, 25) 
     c.setopt(c.MAXREDIRS, 10) 
     c.setopt(c.ENCODING, 'gzip,deflate,sdch') 
     c.setopt(c.SSL_VERIFYHOST, False) 
     c.setopt(c.SSL_VERIFYPEER, False) 
     c.setopt(c.FRESH_CONNECT, True) 
     c.setopt(c.HEADER, False) 
     c.setopt(c.HTTPHEADER, ['Accept-Language: '+str(lang)+'','Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8','Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3']) 
     #c.setopt(c.RETURNTRANSFER, True) 
     c.setopt(c.USERAGENT, agent) 
     c.setopt(c.REFERER, referer) 
     #c.setopt(c.HTTPHEADER, ['Accept: text/html', 'Accept-Charset: UTF-8']) 
     c.perform() 
     if(not (c.getinfo(pycurl.HTTP_CODE) == 200 or c.getinfo(pycurl.HTTP_CODE)==302 or c.getinfo(pycurl.HTTP_CODE)==301) and times>7): 
      if (type_ != 'payed'): 
       print "setting proxy offline" 
       # cur.execute("UPDATE `proxies` SET `status`='inactive',`last_checked`='"+str(int(time.time()))+"' WHERE `proxy`='"+str(proxy)+"'") 
       # cur.execute("UPDATE `proxies` SET `"+str(theCol)+"` = '"+str(int(time.time()))+"',`connections`= `connections`-1 WHERE `proxies`.`proxy` = '"+str(proxy)+"';") 
      quit() 
     elif(len(buff.getvalue())>500): 
      unallowed=False 
      global unallowed_urls 
      dmain=tldextract.extract(c.getinfo(pycurl.EFFECTIVE_URL)) 
      for url in unallowed_urls: 
       dmainurl=tldextract.extract(url) 
       if(dmain[1].strip()==dmainurl[1].strip()): 
        unallowed=True 
      if(not unallowed): 
       ret=buff.getvalue() 
       buff.close() 
       return ret 
      else: 
       print "visiting unallowed url" 
       break; 
     elif(times>12):break 




def getReqs(interface_text): 
    if("<van LAST_LOAD>" in interface_text): 
     interface_text=str(interface_text[interface_text.rfind("<van LAST_LOAD>"):]) 
     cookie_req=re.findall(r"<van[^>]*?type='cookies'[^>]*?>([\s\S]*?)</van>[^<]*?<van[^>]*?type='link_taken'[^>]*?href='([^']*?)'>",interface_text) 
     topclicks=re.findall(r"<van[^>]*?type='top_request'[^>]*?href='([^']*?)'>",interface_text) 
     imgclicks=re.findall(r"<van[^>]*?type='image_request'[^>]*?href='([^']*?)'>",interface_text) 
     ind=list() 
     for d in cookie_req: 
      cooks=re.findall(r"([\S]*?)\t\t([\S]*?)\t\t([\S]*?)\t\t(\d+)",d[0]) 
      rr=dict() 
      rr['cookies']=cooks 
      rr['request']=d[1].strip() 
      type_='image'  
      for d in topclicks: 
       if(rr['request']==d.strip()): type_='toplink' 
      rr['type']=type_ 
      ind.append(rr) 
     return ind 
    else: 
     return False 
def escapeshellarg(arg): 
     """ 
     :param arg: 
     :return: escaped string for ussage as console argument 
     """ 
     return "\\'".join("'" + p + "'" for p in arg.split("'")) 
#output = (Popen(["/usr/bin/java", "-jar", os.path.dirname(os.path.realpath(__file__))+"/headFinder.jar", self.escapeshellarg(str(tree))], stdout=PIPE).communicate()[0]).strip('') 

def getSite(a): 
    file_ = open('bot'+str(a)+'.ini','r').read() 
    p = re.compile(ur'REFERER:([^;]*?);') 
    m = re.search(p, file_) 
    toReturn = m.group(1) 
    return str(toReturn).strip() 

def proxy_status(str): 
    p = re.compile(ur'<van[^>]*?name=\'proxy_status\'[^>]*?value=\'([^\']*?)\'[^>]*?>') 
    m = re.search(p, str) 
    toReturn = m.group(1) 
    return toReturn 

def random_tier(a): 
    data = open(a,'r').read() 
    data = data.split("}") 
    probs = data[1].strip().split('|') 
    num=random.randint(0,100) 
    totes=0 
    toReturn = '' 
    for x in range(0,len(probs)-1): 
     if(num>totes and num<= totes + int(probs[x].strip())): toReturn = data[x+2] 
     totes+=int(probs[x].strip())   
    return toReturn.strip() 

def Random_Lang(): 
    data = open('language.txt','r').read() 
    data = data.split("}") 
    probs = data[1].strip().split('|') 
    num=random.randint(0,100) 
    totes=0 
    toReturn = '' 
    for x in range(0,len(probs)-1): 
     if(num>totes and num<= totes + int(probs[x].strip())): toReturn = data[x+2] 
     totes+=int(probs[x].strip())   
    return toReturn.strip() 
def Random_Agent(): 
    num=random.randint(0,100) 
    if(num<16) : return random_tier("IE.txt") 
    elif(num>16 and num<=48) : return random_tier("firefox.txt") 
    elif(num>48 and num<=93) : return random_tier("CHROME.txt") 
    elif(num>93 and num<=97) : return random_tier("safari.txt") 
    elif(num>97 and num<=100) : return random_tier("opera.txt") 
def Get_Trade(cur,colnum,threadnum): 
    print "SELECT * FROM trades_"+str(threadnum)+" WHERE position = '"+str(colnum)+"'" 
    cur.execute("SELECT * FROM trades_"+str(threadnum)+" WHERE position = '"+str(colnum)+"'") 
    try : 
     if (cur.rowcount > 0): 
      fetch = cur.fetchall() 
      return fetch[0][1],fetch[0][2] 
     else: 
      print "Found No Trade In That Position !" 
      time.sleep(8) 
      quit() 
    except MySQLdb.Error, e: 
     try: 
      print "MySQL Error [%d]: %s" % (e.args[0], e.args[1]) 
     except IndexError: 
      print "MySQL Error: %s" % str(e) 
     time.sleep(8) 
     quit() 
def GetPayedProxy(cur,theCol): 
    print "SELECT * FROM `proxies` WHERE `"+str(theCol)+"`<'"+str(int(time.time()) - 86400)+"' and `status`='active' and `response`='200' and `PAYMENT`='sharedproxies' and `connections`<3" 
    cur.execute("SELECT * FROM `proxies` WHERE `"+str(theCol)+"`<'"+str(int(time.time()) - 86400)+"' and `status`='active' and `response`='200' and `PAYMENT`='sharedproxies' and `connections`<3") 
    try : 
     if (cur.rowcount > 0): 
      fetch = cur.fetchall() 
      return fetch[0][0],'payed' 
     else: 
      print "Found No Shared Proxies available at this time !" 
      time.sleep(2) 
      return False,False 
    except MySQLdb.Error, e: 
     try: 
      print "MySQL Error [%d]: %s" % (e.args[0], e.args[1]) 
     except IndexError: 
      print "MySQL Error: %s" % str(e) 
     time.sleep(2) 
     return False,False 
def GetScannedProxy(cur,theCol): 
    print "SELECT * FROM `proxies` WHERE `"+str(theCol)+"`<'"+str(int(time.time()) - 86400)+"' and `status`='active' and `response`='200' and `PAYMENT`='scanner' and `connections`<3" 
    cur.execute("SELECT * FROM `proxies` WHERE `"+str(theCol)+"`<'"+str(int(time.time()) - 86400)+"' and `status`='active' and `response`='200' and `PAYMENT`='scanner' and `connections`<3") 
    try : 
     if (cur.rowcount > 0): 
      fetch = cur.fetchall() 
      return fetch[0][0],'scanned' 
     else: 
      print "Found No Scanned Proxies available at this time !" 
      time.sleep(2) 
      return False,False 
    except MySQLdb.Error, e: 
     try: 
      print "MySQL Error [%d]: %s" % (e.args[0], e.args[1]) 
     except IndexError: 
      print "MySQL Error: %s" % str(e) 
     time.sleep(2) 
     return False,False 
def GetTTProxy(cur,theCol): 
    print "SELECT * FROM `proxies` WHERE `"+str(theCol)+"`<'"+str(int(time.time()) - 86400)+"' and `status`='active' and `response`='200' and (`tier`='1' or `tier`='2') and `response_time`<10 and `PAYMENT`!='sharedproxies' and `PAYMENT`!='scanner' and `connections`<3" 
    cur.execute("SELECT * FROM `proxies` WHERE `"+str(theCol)+"`<'"+str(int(time.time()) - 86400)+"' and `status`='active' and `response`='200' and (`tier`='1' or `tier`='2') and `response_time`<10 and `PAYMENT`!='sharedproxies' and `PAYMENT`!='scanner' and `connections`<3") 
    try : 
     if (cur.rowcount > 0): 
      fetch = cur.fetchall() 
      return fetch[0][0],'tt' 
     else: 
      print "Found No T1 T2 Proxies available at this time !" 
      time.sleep(2) 
      return False,False 
    except MySQLdb.Error, e: 
     try: 
      print "MySQL Error [%d]: %s" % (e.args[0], e.args[1]) 
     except IndexError: 
      print "MySQL Error: %s" % str(e) 
     time.sleep(2) 
     return False,False 
def GetT3Proxy(cur,theCol): 
    print "SELECT * FROM `proxies` WHERE `"+str(theCol)+"`<'"+str(int(time.time()) - 86400)+"' and `status`='active' and `response`='200' and `tier`='3' and `response_time`<10 and `PAYMENT`!='sharedproxies' and `PAYMENT`!='scanner' and `connections`<3" 
    cur.execute("SELECT * FROM `proxies` WHERE `"+str(theCol)+"`<'"+str(int(time.time()) - 86400)+"' and `status`='active' and `response`='200' and `tier`='3' and `response_time`<10 and `PAYMENT`!='sharedproxies' and `PAYMENT`!='scanner' and `connections`<3") 
    try : 
     if (cur.rowcount > 0): 
      fetch = cur.fetchall() 
      return fetch[0][0],'t3' 
     else: 
      print "Found No T3 Proxies available at this time !" 
      time.sleep(2) 
      return False,False 
    except MySQLdb.Error, e: 
     try: 
      print "MySQL Error [%d]: %s" % (e.args[0], e.args[1]) 
     except IndexError: 
      print "MySQL Error: %s" % str(e) 
     time.sleep(2) 
     return False,False 
def Get_Proxy(cur,theCol): 
    print "Trying to get Shared Proxy" 
    proxy,type=GetPayedProxy(cur,theCol) 
    if(proxy==False or type == False): 
     print "Trying to get Scanned Proxy" 
     proxy,type=GetScannedProxy(cur,theCol) 
     if(proxy==False or type == False): 
      print "Trying to get T1 T2 Proxy" 
      proxy,type=GetTTProxy(cur,theCol) 
      if(proxy==False or type == False): 
       print "Trying to get T3 Proxy" 
       proxy,type=GetT3Proxy(cur,theCol) 
       if(proxy==False or type == False): 
        print "No proxies available at this time!!!" 
       else: 
        return proxy,type 
      else: 
       return proxy,type 
     else: 
      return proxy,type 
    else: 
     return proxy,type 

def getReqs(interface_text): 
    toReturn = dict() 

    return toReturn 
if __name__=='__main__': 
    data="""New URL: http://domain.com/ 
Request (http://domain.com/css/style.css): 
Request (http://domain.com/tp/filter.php?pro=936): 
Request (http://domain.com/tp/a_ft.php?rand=5): 
<van LAST_LOAD> 
Processing images and getting hidden ones 
Request (http://domain.com/tp/img.php): 
Images with width set to over 85 67 
Done processing images. 
Checking Resourse Status 
Resourse retrieval status: Started/Full F http://domain.com/ 
Resourse retrieval status: Started/Full F http://domain.com/css/style.css 
Resourse retrieval status: Started/Full F http://domain.com/tp/filter.php?pro=936 
Resourse retrieval status: Started/Full F http://domain.com/tp/a_ft.php?rand=5 
Resourse retrieval status: Started/Full F http://domain.com/tp/img.php  
Phantom will exit in 33775 




    Reclicking 




    Clicking Image 
    Random Click: 5 
    <van type='image_request' href='http://www.domain.com/st/thumbs/238/YOWF8GaqIz.jpg'> 
    Dims: 204,514,240,180 
    Global mouse position 0 0 
    Moving to mouse to 635 295 
    mouse moved 
    Trying to navigate to: http://domain.com/gallery/www.html?id=437&x=8715eb135db63642cda1ec1c19e8d529&url=aHR0cDovL3d3dy5kcnR1YmVyLmNvbS92aWRlby8xOTE1MDEyL2FtYXRldXItcnVzc2lhbi1zZXgtdGFwZQ==&s=1 
    Caused by: LinkClicked 
    Will actually navigate: false 
    Sent from the page's main frame: false 
    Expected links: 5 
    <van type='cookies'> 
    domain.com  proimg  93ffe5  1417031956 
    domain.com  pro_cc3  394ef8df2b  1417031956 
    domain.com  pro_cc2  3377058  1417031956 
    domain.com  fav  1416945556  1448481556 
    domain.com  tp  MXwwfDE0MTY5NDU1NTZ8MTQxNjk0NTU1NnwwO3Rlc3QyMS5jb20=  1417031956 
    </van> 
    <van type='link_taken' href='http://domain.com/gallery/www.html?id=437&x=8715eb135db63642cda1ec1c19e8d529&url=aHR0cDovL3d3dy5kcnR1YmVyLmNvbS92aWRlby8xOTE1MDEyL2FtYXRldXItcnVzc2lhbi1zZXgtdGFwZQ==&s=1'> 




    Reclicking 




    Clicking Image 
    Random Click: 3 
    <van type='image_request' href='http://www.domain.com/st/thumbs/730/PGy0TRimJJ.jpg'> 
    Dims: 204,22,240,180 
    Global mouse position 635 295 
    Moving to mouse to 143 295 
    mouse moved 
    Trying to navigate to: http://domain.com/gallery/sss.html?id=424&x=e3ad16bcdc583a324acbc3a83f654a7a&url=aHR0cDovL3d3dy5kcnR1YmVyLmNvbS92aWRlby8xOTE2Mjk5L3RvdWNoaW5nLWJlYXV0eXMtanVpY3ktc3BvdA==&s=1 
    Caused by: LinkClicked 
    Will actually navigate: false 
    Sent from the page's main frame: false 
    Expected links: 4 
    <van type='cookies'> 
    domain.com  proimg  93ffe5  1417031956 
    domain.com  pro_cc3  394ef8df2b  1417031956 
    domain.com  pro_cc2  3377058  1417031956 
    domain.com  fav  1416945556  1448481556 
    domain.com  tp  MXwwfDE0MTY5NDU1NTZ8MTQxNjk0NTU1NnwwO3Rlc3QyMS5jb20=  1417031956 
    </van> 
    <van type='link_taken' href='http://domain.com/gallery/sss.html?id=424&x=e3ad16bcdc583a324acbc3a83f654a7a&url=aHR0cDovL3d3dy5kcnR1YmVyLmNvbS92aWRlby8xOTE2Mjk5L3RvdWNoaW5nLWJlYXV0eXMtanVpY3ktc3BvdA==&s=1'> 




    Reclicking 




    Clicking Image 
    Random Click: 7 
    <van type='image_request' href='http://www.domain.com/st/thumbs/867/uLzPrb0K45.jpg'> 
    Dims: 424,22,240,180 
    Global mouse position 143 295 
    Moving to mouse to 143 515 
    mouse moved 
    Trying to navigate to: http://domain.com/gallery/aaa.html?id=466&x=8dcbd277bf725b468c7933cc81692be0&url=aHR0cDovL3d3dy5kcnR1YmVyLmNvbS92aWRlby8xOTExMzQ0L3doaXRlLWFuZC1ibGFjay10ZWVuLWJhYmVzLW1hc3R1cmJhdGluZw==&s=1 
    Caused by: LinkClicked 
    Will actually navigate: false 
    Sent from the page's main frame: false 
    Expected links: 3 
    <van type='cookies'> 
    domain.com  proimg  93ffe5  1417031956 
    domain.com  pro_cc3  394ef8df2b  1417031956 
    domain.com  pro_cc2  3377058  1417031956 
    domain.com  fav  1416945556  1448481556 
    domain.com  tp  MXwwfDE0MTY5NDU1NTZ8MTQxNjk0NTU1NnwwO3Rlc3QyMS5jb20=  1417031956 
    </van> 
    <van type='link_taken' href='http://domain.com/gallery/aaa.html?id=466&x=8dcbd277bf725b468c7933cc81692be0&url=aHR0cDovL3d3dy5kcnR1YmVyLmNvbS92aWRlby8xOTExMzQ0L3doaXRlLWFuZC1ibGFjay10ZWVuLWJhYmVzLW1hc3R1cmJhdGluZw==&s=1'>""" 
    print getReqs(data) 
    quit() 
+1

Что значит «отлично работает» и «это не работает» означает? – abarnert

+1

Что еще более важно, это огромный свалка кода и вывод с очень небольшим объяснением того, какие части мы должны смотреть. Можете ли вы уменьшить это до [минимального, полного, проверяемого примера] (http://stackoverflow.com/help/mcve)? – abarnert

+0

@abarnert Вы можете вызвать функцию со строкой в ​​качестве аргумента, она вернет список словарей, которые я ищу. По крайней мере, это то, что он делает, когда я запускаю его сам по себе, когда я запускаю его в главном скрипте, он возвращает пустой список. – Evan

ответ

1

Вы определяете функцию getReqs вверх по линии 103.

Затем вниз по линии 287, заменить это определение с этим одним:

def getReqs(interface_text): 
    toReturn = dict() 

    return toReturn 

Итак, когда вы вызываете его на строка 395 следующим образом:

print getReqs(data) 

... вы называете второе определение, так что это не удивительно Пойте, что вы печатаете пустой дикт.

+0

Да ... мне потребовались бы дни, чтобы найти ... спасибо. – Evan