Escaping Both Types Of Quotes In Subprocess.Popen Call To Awk
Solution 1:
r"'"
is not the issue. Most likely you're passing maf_cut_off
as an integer, which is incorrect. You should use str(maf_cut_off)
.
Solution 2:
There are several issues. You are trying to execute a shell command (there is a pipe |
in the command). So it won't work even if you convert all variables to strings.
You could execute it using shell:
from pipes import quote
from subprocess import check_output
cmd = r"""tabix %s -B %s | awk '{FS="\t";OFS="\t"} $4 >= %d'""" % (
quote(tgp_snp), quote(infile), maf_cut_off)
output = check_output(cmd, shell=True)
Or you could use the pipe recipe from subprocess
' docs:
from subprocess import Popen, PIPE
tabix = Popen(["tabix", tgp_snp, "-B", infile], stdout=PIPE)
awk = Popen(["awk", r'{FS="\t";OFS="\t"} $4 >= %d' % maf_cut_off],
stdin=tabix.stdout, stdout=PIPE)
tabix.stdout.close() # allow tabix to receive a SIGPIPE if awk exits
output = awk.communicate()[0]
tabix.wait()
Or you could use plumbum
that provides some syntax sugar for shell commands:
from plumbum.cmd import tabix, awk
cmd = tabix[tgp_snp, '-B', infile]
cmd |= awk[r'{FS="\t";OFS="\t"} $4 >= %d' % maf_cut_off]
output = cmd() # run it and get output
Another option is to reproduce the awk
command in pure Python. To get all lines that have 4th field larger than or equal to maf_cut_off
numerically (as an integer):
from subprocess import Popen, PIPE
tabix = Popen(["tabix", tgp_snp, "-B", infile], stdout=PIPE)
lines = []
for line in tabix.stdout:
columns = line.split(b'\t', 4)
if len(columns) > 3 and int(columns[3]) >= maf_cut_off:
lines.append(line)
output = b''.join(lines)
tabix.communicate() # close streams, wait for the subprocess to exit
tgp_snp
, infile
should be strings and maf_cut_off
should be an integer.
You could use bufsize=-1
(Popen()
's parameter) to improve time performance.
Post a Comment for "Escaping Both Types Of Quotes In Subprocess.Popen Call To Awk"