So I'm working on a project to align a sequence ID and its code. I was given a barcode file, which contains a tag for a DNA sequence, i.e. TTAGG. There's several tags (ATTAC, ACCAT, etc.) which then get removed from the a sequence file and placed with a seq ID.
Example:
sequence file --> SEQ 01 TTAGGAACCCAAA
barcode file --> TTAGG
the output file I want will remove the barcode and use it to create a new fasta format file.
Example:
testfile.TTAGG which when opened should have
>SEQ01
AACCCAAA
There are several of these files. I want to take each one of this files that I create and run them through mafft
, but when I run my script, it only concentrates on one file for mafft
. The files I mentioned above come out ok, but when mafft
runs, it only runs the last file created.
Here's my script:
#!/usr/bin/python
import sys
import os
fname = sys.argv[1]
barcodefname = sys.argv[2]
barcodefile = open(barcodefname, "r")
for barcode in barcodefile:
barcode = barcode.strip()
outfname = "%s.%s" % (fname, barcode)
outf = open(outfname, "w+")
handle = open(fname, "r")
mafftname = outfname + ".mafft"
for line in handle:
newline = line.split()
seq = newline[0]
brc = newline[1]
potential_barcode = brc[:len(barcode)]
if potential_barcode == barcode:
outseq = brc[len(barcode):]
barcodeseq = ">%s\n%s\n" % (seq,outseq)
outf.write(barcodeseq)
handle.close()
outf.close()
cmd = "mafft %s > %s" % (outfname, mafftname)
os.system(cmd)
barcodefile.close()
I hope that was clear enough! Please help! I've tried changing my indentations, adjusting when I close the file. Most of the time it won't make the .mafft
file at all, sometimes it does but doesn't put anything it, but mostly it only works on that last file created.
Example:
the beginning of the code creates files such as -
testfile.ATTAC
testfile.AGGAC
testfile.TTAGG
then when it runs mafft
it only creates
testfile.TTAGG.mafft
(with the correct input)
I have tried close the outf
file and then opening it again, in which it tells me I'm coercing it.
I've changed to the outf
file to write only, doesn't change anything.